Page 409 - Jolliffe I. Principal Component Analysis
P. 409
14. Generalizations and Adaptations of Principal Component Analysis
374
14.1 Non-Linear Extensions of Principal
Component Analysis
One way of introducing non-linearity into PCA is what Gnanade-
sikan (1977) calls ‘generalized PCA.’ This extends the vector of p
variables x to include functions of the elements of x. For example,
if p =2, so x =(x 1 ,x 2 ), we could consider linear functions of
2
2
x + =(x 1 ,x 2 ,x ,x ,x 1 x 2 ) that have maximum variance, rather than
2
1
restricting attention to linear functions of x . In theory, any functions
g 1 (x 1 ,x 2 ,...,x p ),g 2 (x 1 ,x 2 ,...,x p ),...,g h (x 1 ,x 2 ,...,x p )of x 1 ,x 2 ,...,x p
could be added to the original vector x, in order to construct an extended
vector x + whose PCs are then found. In practice, however, Gnanadesikan
(1977) concentrates on quadratic functions, so that the analysis is a proce-
dure for finding quadratic rather than linear functions of x that maximize
variance.
An obvious alternative to Gnanadesikan’s (1977) proposal is to replace
x by a function of x, rather than add to x as in Gnanadesikan’s analysis.
Transforming x in this way might be appropriate, for example, if we are
interested in products of powers of the elements of x. In this case, taking log-
arithms of the elements and doing a PCA on the transformed data provides
a suitable analysis. Another possible use of transforming to non-linear PCs
is to detect near-constant, non-linear relationships between the variables. If
an appropriate transformation is made, such relationships will be detected
by the last few PCs of the transformed data. Transforming the data is sug-
gested before doing a PCA for allometric data (see Section 13.2) and for
compositional data (Section 13.3). Kazmierczak (1985) also advocates log-
arithmic transformation followed by double-centering (see Section 14.2.3)
for data in which it is important for a PCA to be invariant to changes in
the units of measurement and to the choice of which measurement is used
as a ‘reference.’ However, as noted in the introduction to Chapter 4, trans-
formation of variables should only be undertaken, in general, after careful
thought about whether it is appropriate for the data set at hand.
14.1.1 Non-Linear Multivariate Data Analysis—Gifi and
Related Approaches
The most extensively developed form of non-linear multivariate data anal-
ysis in general, and non-linear PCA in particular, is probably the Gifi
(1990) approach. ‘Albert Gifi’ is the nomdeplume of the members of the
Department of Data Theory at the University of Leiden. As well as the
1990 book, the Gifi contributors have published widely on their system
of multivariate analysis since the 1970s, mostly under their own names.
Much of it is not easy reading. Here we attempt only to outline the ap-
proach. A rather longer, accessible, description is provided by Krzanowski

