Page 410 - Jolliffe I. Principal Component Analysis
P. 410

14.1. Non-Linear Extensions of Principal Component Analysis
                                                                                            375
                              and Marriott (1994, Chapter 8), and Michailidis and de Leeuw (1998) give
                              a review.
                                Gifi’s (1990) form of non-linear PCA is based on a generalization of the
                              result that if, for an (n × p) data matrix X, we minimize

                                                  tr{(X − YB ) (X − YB )},              (14.1.1)

                              with respect to the (n × q) matrix Y whose columns are linear functions of
                              columns of X, and with respect to the (q ×p) matrix B where the columns

                              of B are orthogonal, then the optimal Y consists of the values (scores) of
                              the first q PCs for the n observations, and the optimal matrix B consists
                              of the coefficients of the first q PCs. The criterion (14.1.1) corresponds to
                              that used in the sample version of Property A5 (see Section 2.1), and can
                              be rewritten as
                                                                         
                                                  p                      


                                               tr     (x j − Yb j ) (x j − Yb j )  ,    (14.1.2)
                                                   j=1
                                                                         

                              where x j , b j are the jth columns of X, B , respectively.
                                Gifi’s (1990) version of non-linear PCA is designed for categorical vari-
                              ables so that there are no immediate values of x j to insert in (14.1.2). Any
                              variables that are continuous are first converted to categories; then values
                              need to be derived for each category of every variable. We can express this
                              algebraically as the process minimizing
                                                                            
                                                p                           

                                             tr    (G j c j − Yb j ) (G j c j − Yb j )  ,  (14.1.3)

                                                                            
                                                 j=1
                              where G j is an (n × g j ) indicator matrix whose (h, i)th value is unity if
                              the hth observation is in the ith category of the jth variable and is zero
                              otherwise, and c j is a vector of length g j containing the values assigned
                              to the g j categories of the jth variable. The minimization takes place with
                              respect to both c j and Yb j , so that the difference from (linear) PCA is
                              that there is optimization over the values of the variables in addition to
                              optimization of the scores on the q components. The solution is found by
                              an alternating least squares (ALS) algorithm which alternately fixes the
                              c j and minimizes with respect to the Yb j , then fixes the Yb j at the new
                              values and minimizes with respect to the c j , fixes the c j at the new values
                              and minimizes over Yb j , and so on until convergence. This is implemented
                              by the Gifi-written PRINCALS computer program (Gifi, 1990, Section 4.6)
                              which is incorporated in the SPSS software.
                                A version of non-linear PCA also appears in another guise within the
                              Gifi system. For two categorical variables we have a contingency table that
                              can be analysed by correspondence analysis (Section 13.1). For more than
                              two categorical variables there is an extension of correspondence analysis,
                              called multiple correspondence analysis (see Section 13.1 and Greenacre,
   405   406   407   408   409   410   411   412   413   414   415