Page 174 - Jolliffe I. Principal Component Analysis
P. 174

143
                                                           6.3. Selecting a Subset of Variables
                              of m variables, but rather than treating m as fixed they also consider
                              how to choose m. They use methods of variable selection due to Jolliffe
                              (1972, 1973), adding a new variant that was computationally infeasible in
                              1972. To choose m, King and Jackson (1999) consider the rules described
                              in Sections 6.1.1 and 6.1.2, including the broken stick method, together
                              with a rule that selects the largest value of m for which n/m > 3. To
                              assess the quality of a chosen subset of size m, King and Jackson (1999)
                              compare plots of scores on the first two PCs for the full data set and for
                              the data set containing only the m selected variables. They also compute a
                              Procrustes measure of fit (Krzanowski, 1987a) between the m-dimensional
                              configurations given by PC scores in the full and reduced data sets, and a
                              weighted average of correlations between PCs in the full and reduced data
                              sets.
                                The data set analyzed by King and Jackson (1999) has n =37 and
                              p = 36. The results of applying the various selection procedures to these
                              data confirm, as Jolliffe (1972, 1973) found, that methods B2 and B4 do
                              reasonably well. The results also confirm that the broken stick method
                              generally chooses smaller values of m than the other methods, though its
                              subsets do better with respect to the Procrustes measure of fit than some
                              much larger subsets. The small number of variables retained by the broken
                              stick implies a corresponding small proportion of total variance accounted
                              for by the subsets it selects. King and Jackson’s (1999) recommendation of
                              method B4 with the broken stick could therefore be challenged.
                                We conclude this section by briefly describing a number of other possible
                              methods for variable selection. None uses PCs directly to select variables,
                              but all are related to topics discussed more fully in other sections or chap-
                              ters. Bartkowiak (1991) uses a method described earlier in Bartkowiak
                              (1982) to select a set of ‘representative’ variables in an example that also
                              illustrates the choice of the number of PCs (see Section 6.1.8). Variables
                              are added sequentially to a ‘representative set’ by considering each vari-
                              able currently outside the set as a candidate for inclusion. The maximum
                              residual sum of squares is calculated from multiple linear regressions of
                              each of the other excluded variables on all the variables in the set plus the
                              candidate variable. The candidate for which this maximum sum of squares
                              is minimized is then added to the set. One of Jolliffe’s (1970, 1972, 1973)
                              rules uses a similar idea, but in a non-sequential way. A set of m variables
                              is chosen if it maximizes the minimum multiple correlation between each
                              of the (p − m) non-selected variables and the set of m selected variables.
                                The RV-coefficient, due to Robert and Escoufier (1976), was defined in
                              Section 3.2. To use the coefficient to select a subset of variables, Robert

                              and Escoufier suggest finding X 1 which maximizes RV(X, M X 1 ), where
                              RV(X, Y) is defined by equation (3.2.2) of Section 3.2. The matrix X 1
                              is the (n × m) submatrix of X consisting of n observations on a subset
                              of m variables, and M is a specific (m × m) orthogonal matrix, whose
                              construction is described in Robert and Escoufier’s paper. It is interesting
   169   170   171   172   173   174   175   176   177   178   179