Page 173 - Jolliffe I. Principal Component Analysis
P. 173
6. Choosing a Subset of Principal Components or Variables
142
Cadima et al. (2002) compare various algorithms for finding good subsets
according to the measures (6.3.4) and (6.3.5), and also with respect to the
RV-coefficient, which is discussed briefly below (see also Section 3.2). Two
versions of simulated annealing, a genetic algorithm, and a restricted im-
provement algorithm, are compared with a number of stepwise algorithms,
on a total of fourteen data sets. The results show a general inferiority of
the stepwise methods, but no single algorithm outperforms all the others.
Cadima et al. (2002) recommend using simulated annealing or a genetic al-
gorithm to provide a starting point for a restricted improvement algorithm,
which then refines the solution. They make the interesting point that for
large p the number of candidate subsets is so large that, for criteria whose
range of values is bounded, it is almost inevitable that there are many solu-
tions that are very close to optimal. For instance, in one of their examples,
with p = 62, they find 800 solutions corresponding to a population size of
800 in their genetic algorithm. The best of these has a value 0.8079 for the
criterion (6.3.5), but the worst is 0.8060, less than 0.3% smaller. Of course,
it is possible that the global optimum is much greater than the best of
these 800, but it seems highly unlikely.
Al-Kandari (1998) provides an extensive study of a large number of
variable selection methods. The ideas of Jolliffe (1972, 1973) and McCabe
(1984) are compared with a variety of new methods, based on loadings in
the PCs, on correlations of the PCs with the variables, and on versions of
McCabe’s (1984) principal variables that are constructed from correlation,
rather than covariance, matrices. The methods are compared on simulated
data with a wide range of covariance or correlation structures, and on var-
ious real data sets that are chosen to have similar covariance/correlation
structures to those of the simulated data. On the basis of the results of
these analyses, it is concluded that few of the many techniques considered
are uniformly inferior to other methods, and none is uniformly superior.
The ‘best’ method varies, depending on the covariance or correlation struc-
ture of a data set. It also depends on the ‘measure of efficiency’ used to
determine how good is a subset of variables, as noted also by Cadima and
Jolliffe (2001). In assessing which subsets of variables are best, Al-Kandari
(1998) additionally takes into account the interpretability of the PCs based
on the subset, relative to the PCs based on all p variables (see Section 11.3).
Al-Kandari (1998) also discusses the distinction between criteria used to
choose subsets of variables and criteria used to evaluate how good a chosen
subset is. The latter are her ‘measures of efficiency’ and ideally these same
criteria should be used to choose subsets in the first place. However, this
may be computationally infeasible so that a suboptimal but computation-
ally straightforward criterion is used to do the choosing instead. Some of
Al-Kandari’s (1998) results are reported in Al-Kandari and Jolliffe (2001)
for covariance, but not correlation, matrices.
King and Jackson (1999) combine some of the ideas of the present Section
with some from Section 6.1. Their main objective is to select a subset

