Page 148 - Jolliffe I. Principal Component Analysis
P. 148
6.1. How Many Principal Components?
117
which the scree graph defines a more-or-less straight line, not necessarily
horizontal. The first point on the straight line is then taken to be the last
factor/component to be retained. If there are two or more straight lines
formed by the lower eigenvalues, then the cut-off is taken at the upper (left-
hand) end of the left-most straight line. Cattell (1966) discusses at some
length whether the left-most point on the straight line should correspond
to the first excluded factor or the last factor to be retained. He concludes
that it is preferable to include this factor, although both variants are used
in practice.
The rule in Section 6.1.1 is based on t m = k=1 k , the rule in Sec-
l
m
tion 6.1.2 looks at individual eigenvalues l k , and the current rule, as applied
to PCA, uses l k−1 − l k as its criterion. There is, however, no formal nu-
merical cut-off based on l k−1 − l k and, in fact, judgments of when l k−1 − l k
stops being large (steep) will depend on the relative values of l k−1 − l k
and l k − l k+1 ,aswellasthe absolute value of l k−1 − l k . Thus the rule is
based subjectively on the second, as well as the first, differences among
the l k . Because of this, it is difficult to write down a formal numerical rule
and the procedure has until recently remained purely graphical. Tests that
attempt to formalize the procedure, due to Bentler and Yuan (1996,1998),
are discussed in Section 6.1.4.
Cattell’s formulation, where we look for the point at which l k−1 − l k
becomes fairly constant for several subsequent values, is perhaps less sub-
jective, but still requires some degree of judgment. Both formulations of
the rule seem to work well in practice, provided that there is a fairly sharp
‘elbow,’ or change of slope, in the graph. However, if the slope gradually
becomes less steep, with no clear elbow, as in Figure 6.1, then it is clearly
less easy to use the procedure.
A number of methods have been suggested in which the scree plot is
compared with a corresponding plot representing given percentiles, often a
95 percentile, of the distributions of each variance (eigenvalue) when PCA
is done on a ‘random’ matrix. Here ‘random’ usually refers to a correlation
matrix obtained from a random sample of n observations on p uncorrelated
normal random variables, where n, p are chosen to be the same as for the
data set of interest. A number of varieties of this approach, which goes
under the general heading parallel analysis, have been proposed in the
psychological literature. Parallel analysis dates back to Horn (1965), where
it was described as determining the number of factors in factor analysis.
Its ideas have since been applied, sometimes inappropriately, to PCA.
Most of its variants use simulation to construct the 95 percentiles em-
pirically, and some examine ‘significance’ of loadings (eigenvectors), as well
as eigenvalues, using similar reasoning. Franklin et al. (1995) cite many of
the most relevant references in attempting to popularize parallel analysis
amongst ecologists. The idea in versions of parallel analysis that concen-
trate on eigenvalues is to retain m PCs, where m is the largest integer for
which the scree graph lies above the graph of upper 95 percentiles. Boot-

