Page 411 - Jolliffe I. Principal Component Analysis

P. 411

14. Generalizations and Adaptations of Principal Component Analysis
376
1984, Chapter 5). This technique has at its core the idea of assigning scores
to each category of each variable. It can be shown that if a PCA is done
on the correlation matrix of these scores, the first PC is equivalent to the
first non-trivial multiple correspondence analysis dimension (Bekker and
de Leeuw, 1988). These authors give further discussion of the relationships
between the different varieties of non-linear PCA.
Mori et al. (1998) combine the Gifi approach with the procedure de-
scribed by Tanaka and Mori (1997) for selecting a subset of variables (see
Section 6.3). Using the optimal values of the variables c j found by minimi-
zing (14.1.3), variables are selected in the same way as in Tanaka and Mori
(1997). The results can be thought of as either an extension of Tanaka and
Mori’s method to qualitative data, or as a simplication of Gifi’s non-linear
PCA by using only a subset of variables.
An approach that overlaps with—but differs from—the main Gifi ideas
underlying non-linear PCA is described by Meulman (1986). Categori-
cal data are again transformed to give optimal scores or values for each
category of each variable, and simultaneously a small number of opti-
mal dimensions is found within which to represent these scores. The
‘non-linearity’ of the technique becomes more obvious when a continuous
variable is fitted into this framework by first dividing its range of values
into a finite number of categories and then assigning a value to each cat-
egory. The non-linear transformation is thus a step function. Meulman’s
(1986) proposal, which is known as the distance approach to nonlinear
multivariate data analysis, differs from the main Gifi (1990) framework
by using different optimality criteria (loss functions) instead of (14.1.3).
Gifi’s (1990) algorithms concentrate on the representation of the variables
in the analysis, so that representation of the objects (observations) can
be suboptimal. The distance approach directly approximates distances be-
tween objects. Krzanowski and Marriott (1994, Chapter 8) give a readable
introduction to, and an example of, the distance approach.
An example of Gifi non-linear PCA applied in an agricultural context
and involving a mixture of categorical and numerical variables is given by
Kroonenberg et al. (1997). Michailidis and de Leeuw (1998) discuss various
aspects of stability for Gifi-based methods, and Verboon (1993) describes
a robust version of a Gifi-like procedure.
A sophisticated way of replacing the variables by functions of the vari-
ables, and hence incorporating non-linearity, is described by Besse and
Ferraty (1995). It is based on an adaptation of the fixed effects model
which was introduced in Section 3.9. The adaptation is that, whereas be-
fore we had E(x i )= z i ,now E[f(x i )] = z i , where f(x i )isa p-dimensional
vector of functions of x i . As before, z i lies in a q-dimensional subspace F q ,
2
but var(e i ) is restricted to be σ I p . The quantity to be minimized is similar
to (3.9.1) with x i replaced by f(x i ). In the current problem it is necessary
to choose q and then optimize with respect to the q-dimensional subspace
F q and with respect to the functions f(.). The functions must be restricted

406 407 408 409 410 411 412 413 414 415 416