Page 438 - Jolliffe I. Principal Component Analysis
P. 438

14.6. Miscellanea
                                                                                            403
                              14.6.3 Regression Components, Sweep-out Components and
                                      Extended Components
                              Ottestad (1975) proposed an alternative to PCA that he called regression
                              components. He developed this for standardized variables, and hence it is
                              a correlation-based analysis. The new variables, or regression components,
                              y 1 ,y 2 ,...,y p are defined in terms of the original (standardized) variables
                              x 1 ,x 2 ,...,x p as y 1 = x 1 , y 2 = x 2 − b 21 x 1 , y 3 = x 3 − b 31 x 1 − b 32 x 2 , ...,
                                           y p = x p − b p1 x 1 − b p2 x 2 − ...b p(p−1) x (p−1) ,
                              where b jk is the regression coefficient of x k in a regression of x j on all other
                              variables on the right hand-side of the equation defining y j . It should be
                              stressed that the labelling in these defining equations has been chosen for
                              simplicity to correspond to the order in which the y variables are defined. It
                              will usually be different from the labelling of the data as originally recorded.
                              The x variables can be selected in any order to define the y variables and
                              the objective of the technique is to choose a best order from the p!pos-
                              sibilities. This is done by starting with y p , for which x p is chosen to be
                              the original variable that has maximum multiple correlation with the other
                              (p − 1) variables. The next variable x (p−1) ,fromwhich y (p−1) is defined,
                                                         2
                                                                   2
                                                  2
                              minimizes (1 + b p(p−1) ) (1 − R ), where R denotes the multiple correla-
                              tion of x (p−1) with x (p−2) ,x (p−3) ,...,x 1 , and so on until only x 1 is left.
                              The reasoning behind the method, which gives uncorrelated components,
                              is that it provides results that are simpler to interpret than PCA in the
                              examples that Ottestad (1975) studies. However, orthogonality of vectors
                              of coefficients and successive variance maximization are both lost. Unlike
                              the techniques described in Chapter 11, no explicit form of simplicity is
                              targeted and neither is there any overt attempt to limit variance loss, so
                              the method is quite different from PCA.
                                A variation on the same theme is proposed by Atiqullah and Uddin
                              (1993). They also produce new variables y 1 ,y 2 ,...,y p from a set of mea-
                              sured variables x 1 ,x 2 ,...,x p in a sequence y 1 = x 1 , y 2 = x 2 − b 21 x 1 ,
                              y 3 = x 3 − b 31 x 1 − b 32 x 2 , ...,

                                           y p = x p − b p1 x 1 − b p2 x 2 − ...b p(p−1) x (p−1) ,
                              but for a different set of b kj . Although the details are not entirely clear it
                              appears that, unlike Ottestad’s (1975) method, the ordering in the sequence
                              is not determined by statistical criteria, but simply corresponds to the
                              labels on the original x variables. Atiqullah and Uddin (1993) transform
                              the covariance matrix for the x variables into upper triangular form, with
                              diagonal elements equal to unity. The elements of this matrix above the
                              diagonal are then the b kj . As with Ottestad’s method, the new variables,
                              called sweep-out components, are uncorrelated.
                                Rather than compare variances of y 1 ,y 2 ,...,y p , which do not sum to

                                p  var(x i ), both Ottestad (1975) and Atiqullah and Uddin (1993) de-
                                j=1
   433   434   435   436   437   438   439   440   441   442   443