Page 409 - Jolliffe I. Principal Component Analysis
P. 409

14. Generalizations and Adaptations of Principal Component Analysis
                              374
                              14.1 Non-Linear Extensions of Principal
                                      Component Analysis
                              One way of introducing non-linearity into PCA is what Gnanade-
                              sikan (1977) calls ‘generalized PCA.’ This extends the vector of p
                              variables x to include functions of the elements of x. For example,
                              if p =2, so x     =(x 1 ,x 2 ), we could consider linear functions of
                                               2
                                            2
                              x   +  =(x 1 ,x 2 ,x ,x ,x 1 x 2 ) that have maximum variance, rather than
                                               2
                                            1
                              restricting attention to linear functions of x . In theory, any functions

                              g 1 (x 1 ,x 2 ,...,x p ),g 2 (x 1 ,x 2 ,...,x p ),...,g h (x 1 ,x 2 ,...,x p )of x 1 ,x 2 ,...,x p
                              could be added to the original vector x, in order to construct an extended
                              vector x + whose PCs are then found. In practice, however, Gnanadesikan
                              (1977) concentrates on quadratic functions, so that the analysis is a proce-
                              dure for finding quadratic rather than linear functions of x that maximize
                              variance.
                                An obvious alternative to Gnanadesikan’s (1977) proposal is to replace
                              x by a function of x, rather than add to x as in Gnanadesikan’s analysis.
                              Transforming x in this way might be appropriate, for example, if we are
                              interested in products of powers of the elements of x. In this case, taking log-
                              arithms of the elements and doing a PCA on the transformed data provides
                              a suitable analysis. Another possible use of transforming to non-linear PCs
                              is to detect near-constant, non-linear relationships between the variables. If
                              an appropriate transformation is made, such relationships will be detected
                              by the last few PCs of the transformed data. Transforming the data is sug-
                              gested before doing a PCA for allometric data (see Section 13.2) and for
                              compositional data (Section 13.3). Kazmierczak (1985) also advocates log-
                              arithmic transformation followed by double-centering (see Section 14.2.3)
                              for data in which it is important for a PCA to be invariant to changes in
                              the units of measurement and to the choice of which measurement is used
                              as a ‘reference.’ However, as noted in the introduction to Chapter 4, trans-
                              formation of variables should only be undertaken, in general, after careful
                              thought about whether it is appropriate for the data set at hand.


                              14.1.1 Non-Linear Multivariate Data Analysis—Gifi and
                                      Related Approaches
                              The most extensively developed form of non-linear multivariate data anal-
                              ysis in general, and non-linear PCA in particular, is probably the Gifi
                              (1990) approach. ‘Albert Gifi’ is the nomdeplume of the members of the
                              Department of Data Theory at the University of Leiden. As well as the
                              1990 book, the Gifi contributors have published widely on their system
                              of multivariate analysis since the 1970s, mostly under their own names.
                              Much of it is not easy reading. Here we attempt only to outline the ap-
                              proach. A rather longer, accessible, description is provided by Krzanowski
   404   405   406   407   408   409   410   411   412   413   414