Page 161 - Jolliffe I. Principal Component Analysis
P. 161
6. Choosing a Subset of Principal Components or Variables
130
Section 6.1.1. They use instead the percentage of ‘signal’ accounted for,
although the PCA is done on a covariance matrix other than that associ-
ated with the signal (see Section 12.4.3). Buell (1978) advocates stability
with respect to different degrees of approximation of a continuous spatial
field by discrete points as a criterion for choosing m. Section 13.3.4 of von
Storch and Zwiers (1999) is dismissive of selection rules.
6.1.8 Discussion
Although many rules have been examined in the last seven subsections,
the list is by no means exhaustive. For example, in Section 5.1 we noted
that superimposing a minimum spanning tree on a plot of the observations
with respect to the first two PCs gives a subjective indication of whether or
not a two-dimensional representation is adequate. It is not possible to give
definitive guidance on which rules are best, but we conclude this section
with a few comments on their relative merits. First, though, we discuss a
small selection of the many comparative studies that have been published.
Reddon (1984, Section 3.9) describes nine such studies, mostly from the
psychological literature, but all are concerned with factor analysis rather
than PCA. A number of later studies in the ecological, psychological and
meteorological literatures have examined various rules on both real and
simulated data sets. Simulation of multivariate data sets can always be
criticized as unrepresentative, because they can never explore more than
a tiny fraction of the vast range of possible correlation and covariance
structures. Several of the published studies, for example Grossman et al.
(1991), Richman (1988), are particularly weak in this respect, looking only
at simulations where all p of the variables are uncorrelated, a situation
which is extremely unlikely to be of much interest in practice. Another
weakness of several psychology-based studies is their confusion between
PCA and factor analysis. For example, Zwick and Velicer (1986) state that
‘if PCA is used to summarize a data set each retained component must
contain at least two substantial loadings.’ If the word ‘summarize’ implies
a descriptive purpose the statement is nonsense, but in the simulation study
that follows all their ‘components’ have three or more large loadings. With
this structure, based on factor analysis, it is no surprise that Zwick and
Velicer (1986) conclude that some of the rules they compare, which were
designed with descriptive PCA in mind, retain ‘too many’ factors.
Jackson (1993) investigates a rather broader range of structures, includ-
ing up to 12 variables in up to 3 correlated groups, as well as the completely
uncorrelated case. The range of stopping rules is also fairly wide, includ-
ing: Kaiser’s rule; the scree graph; the broken stick rule; the proportion of
total variance; tests of equality of eigenvalues; and Jackson’s two bootstrap
procedures described in Section 6.1.5. Jackson (1993) concludes that the
broken stick and bootstrapped eigenvalue-eigenvector rules give the best
results in his study. However, as with the reasoning used to develop his

