Page 150 - Jolliffe I. Principal Component Analysis
P. 150
119
6.1. How Many Principal Components?
is based on the assumption of multivariate normality for x, and is only
approximately true even then. The second problem is concerned with the
fact that unless H 0,p−2 is rejected, there are several tests to be done, so
that the overall significance level of the sequence of tests is not the same
as the individual significance levels of each test. Furthermore, it is difficult
to get even an approximate idea of the overall significance level because
the number of tests done is not fixed but random, and the tests are not
independent of each other. It follows that, although the testing sequence
suggested above can be used to estimate m, it is dangerous to treat the
procedure as a formal piece of statistical inference, as significance levels are
usually unknown. The reverse sequence H 00 ,H 01 ,... canbeusedinstead
until the first non-rejection occurs (Jackson, 1991, Section 2.6), but this
suffers from similar problems.
The procedure could be added to the list of ad hoc rules, but it has
one further, more practical, disadvantage, namely that in nearly all real
examples it tends to retain more PCs than are really necessary. Bartlett
(1950), in introducing the procedure for correlation matrices, refers to it
as testing how many of the PCs are statistically significant, but ‘statistical
significance’ in the context of these tests does not imply that a PC accounts
for a substantial proportion of the total variation. For correlation matrices,
Jolliffe (1970) found that the rule often corresponds roughly to choosing a
cut-off l of about 0.1 to 0.2 in the method of Section 6.1.2. This is much
∗
smaller than is recommended in that section, and occurs because defining
unimportant PCs as those with variances equal to that of the last PC is
not necessarily a sensible way of finding m. If this definition is acceptable,
as it may be if the model of Tipping and Bishop (1999a) (see Section 3.9) is
assumed, for example, then the sequential testing procedure may produce
satisfactory results, but it is easy to construct examples where the method
gives silly answers. For instance, if there is one near-constant relationship
among the elements of x, with a much smaller variance than any other
PC, then the procedure rejects H 0,p−2 and declares that all PCs need to
be retained, regardless of how nearly equal are the next few eigenvalues.
The method of this section is similar in spirit to, though more formal-
ized than, one formulation of the scree graph. Looking for the first ‘shallow’
slope in the graph corresponds to looking for the first of two consecutive
eigenvalues that are nearly equal. The scree graph differs from the formal
testing procedure in that it starts from the largest eigenvalue and com-
pares consecutive eigenvalues two at a time, whereas the tests start with
the smallest eigenvalues and compare blocks of two, three, four and so on.
Another difference is that the ‘elbow’ point is retained in Cattell’s formu-
lation of the scree graph, but excluded in the testing procedure. The scree
graph is also more subjective but, as has been stated above, the objectivity
of the testing procedure is something of an illusion.
Cattell’s original formulation of the scree graph differs from the above
since it is differences l k−1 − l k , rather than l k , which must be equal beyond

