Page 173 - Jolliffe I. Principal Component Analysis
P. 173

6. Choosing a Subset of Principal Components or Variables
                              142
                              Cadima et al. (2002) compare various algorithms for finding good subsets
                              according to the measures (6.3.4) and (6.3.5), and also with respect to the
                              RV-coefficient, which is discussed briefly below (see also Section 3.2). Two
                              versions of simulated annealing, a genetic algorithm, and a restricted im-
                              provement algorithm, are compared with a number of stepwise algorithms,
                              on a total of fourteen data sets. The results show a general inferiority of
                              the stepwise methods, but no single algorithm outperforms all the others.
                              Cadima et al. (2002) recommend using simulated annealing or a genetic al-
                              gorithm to provide a starting point for a restricted improvement algorithm,
                              which then refines the solution. They make the interesting point that for
                              large p the number of candidate subsets is so large that, for criteria whose
                              range of values is bounded, it is almost inevitable that there are many solu-
                              tions that are very close to optimal. For instance, in one of their examples,
                              with p = 62, they find 800 solutions corresponding to a population size of
                              800 in their genetic algorithm. The best of these has a value 0.8079 for the
                              criterion (6.3.5), but the worst is 0.8060, less than 0.3% smaller. Of course,
                              it is possible that the global optimum is much greater than the best of
                              these 800, but it seems highly unlikely.
                                Al-Kandari (1998) provides an extensive study of a large number of
                              variable selection methods. The ideas of Jolliffe (1972, 1973) and McCabe
                              (1984) are compared with a variety of new methods, based on loadings in
                              the PCs, on correlations of the PCs with the variables, and on versions of
                              McCabe’s (1984) principal variables that are constructed from correlation,
                              rather than covariance, matrices. The methods are compared on simulated
                              data with a wide range of covariance or correlation structures, and on var-
                              ious real data sets that are chosen to have similar covariance/correlation
                              structures to those of the simulated data. On the basis of the results of
                              these analyses, it is concluded that few of the many techniques considered
                              are uniformly inferior to other methods, and none is uniformly superior.
                              The ‘best’ method varies, depending on the covariance or correlation struc-
                              ture of a data set. It also depends on the ‘measure of efficiency’ used to
                              determine how good is a subset of variables, as noted also by Cadima and
                              Jolliffe (2001). In assessing which subsets of variables are best, Al-Kandari
                              (1998) additionally takes into account the interpretability of the PCs based
                              on the subset, relative to the PCs based on all p variables (see Section 11.3).
                                Al-Kandari (1998) also discusses the distinction between criteria used to
                              choose subsets of variables and criteria used to evaluate how good a chosen
                              subset is. The latter are her ‘measures of efficiency’ and ideally these same
                              criteria should be used to choose subsets in the first place. However, this
                              may be computationally infeasible so that a suboptimal but computation-
                              ally straightforward criterion is used to do the choosing instead. Some of
                              Al-Kandari’s (1998) results are reported in Al-Kandari and Jolliffe (2001)
                              for covariance, but not correlation, matrices.
                                King and Jackson (1999) combine some of the ideas of the present Section
                              with some from Section 6.1. Their main objective is to select a subset
   168   169   170   171   172   173   174   175   176   177   178