Page 146 - Jolliffe I. Principal Component Analysis
P. 146
115
6.1. How Many Principal Components?
As well as these intuitive justifications, Kaiser (1960) put forward a num-
ber of other reasons for a cut-off at l k = 1. It must be noted, however, that
most of the reasons are pertinent to factor analysis (see Chapter 7), rather
than PCA, although Kaiser refers to PCs in discussing one of them.
It can be argued that a cut-off at l k = 1 retains too few variables. Con-
sider a variable which, in the population, is more-or-less independent of
all other variables. In a sample, such a variable will have small coefficients
in (p − 1) of the PCs but will dominate one of the PCs, whose variance
l k will be close to 1 when using the correlation matrix. As the variable
provides independent information from the other variables it would be un-
wise to delete it. However, deletion will occur if Kaiser’s rule is used, and
if, due to sampling variation, l k < 1. It is therefore advisable to choose
∗
a cut-off l lower than 1, to allow for sampling variation. Jolliffe (1972)
suggested, based on simulation studies, that l =0.7 is roughly the correct
∗
level. Further discussion of this cut-off level will be given with respect to
examples in Sections 6.2 and 6.4.
The rule just described is specifically designed for correlation matrices,
but it can be easily adapted for covariance matrices by taking as a cut-off l ∗
¯
the average value l of the eigenvalues or, better, a somewhat lower cut-off
¯
such as l =0.7l. For covariance matrices with widely differing variances,
∗
however, this rule and the one based on t k from Section 6.1.1 retain very
few (arguably, too few) PCs, as will be seen in the examples of Section 6.2.
An alternative way of looking at the sizes of individual variances is to use
the so-called broken stick model. If we have a stick of unit length, broken
at random into p segments, then it can be shown that the expected length
of the kth longest segment is
1 1
p
∗
l = .
p j
k
j=k
One way of deciding whether the proportion of variance accounted for by
the kth PC is large enough for that component to be retained is to compare
the proportion with l . Principal components for which the proportion
∗
k
∗
exceeds l are then retained, and all other PCs deleted. Tables of l are
∗
k k
available for various values of p and k (see, for example, Legendre and
Legendre (1983, p. 406)).
6.1.3 The Scree Graph and the Log-Eigenvalue Diagram
The first two rules described above usually involve a degree of subjectiv-
∗
ity in the choice of cut-off levels, t and l respectively. The scree graph,
∗
which was discussed and named by Cattell (1966) but which was already
in common use, is even more subjective in its usual form, as it involves
looking at a plot of l k against k (see Figure 6.1, which is discussed in detail
in Section 6.2) and deciding at which value of k the slopes of lines joining

