Page 150 - Jolliffe I. Principal Component Analysis
P. 150

119
                                                        6.1. How Many Principal Components?
                              is based on the assumption of multivariate normality for x, and is only
                              approximately true even then. The second problem is concerned with the
                              fact that unless H 0,p−2 is rejected, there are several tests to be done, so
                              that the overall significance level of the sequence of tests is not the same
                              as the individual significance levels of each test. Furthermore, it is difficult
                              to get even an approximate idea of the overall significance level because
                              the number of tests done is not fixed but random, and the tests are not
                              independent of each other. It follows that, although the testing sequence
                              suggested above can be used to estimate m, it is dangerous to treat the
                              procedure as a formal piece of statistical inference, as significance levels are
                              usually unknown. The reverse sequence H 00 ,H 01 ,... canbeusedinstead
                              until the first non-rejection occurs (Jackson, 1991, Section 2.6), but this
                              suffers from similar problems.
                                The procedure could be added to the list of ad hoc rules, but it has
                              one further, more practical, disadvantage, namely that in nearly all real
                              examples it tends to retain more PCs than are really necessary. Bartlett
                              (1950), in introducing the procedure for correlation matrices, refers to it
                              as testing how many of the PCs are statistically significant, but ‘statistical
                              significance’ in the context of these tests does not imply that a PC accounts
                              for a substantial proportion of the total variation. For correlation matrices,
                              Jolliffe (1970) found that the rule often corresponds roughly to choosing a
                              cut-off l of about 0.1 to 0.2 in the method of Section 6.1.2. This is much
                                     ∗
                              smaller than is recommended in that section, and occurs because defining
                              unimportant PCs as those with variances equal to that of the last PC is
                              not necessarily a sensible way of finding m. If this definition is acceptable,
                              as it may be if the model of Tipping and Bishop (1999a) (see Section 3.9) is
                              assumed, for example, then the sequential testing procedure may produce
                              satisfactory results, but it is easy to construct examples where the method
                              gives silly answers. For instance, if there is one near-constant relationship
                              among the elements of x, with a much smaller variance than any other
                              PC, then the procedure rejects H 0,p−2 and declares that all PCs need to
                              be retained, regardless of how nearly equal are the next few eigenvalues.
                                The method of this section is similar in spirit to, though more formal-
                              ized than, one formulation of the scree graph. Looking for the first ‘shallow’
                              slope in the graph corresponds to looking for the first of two consecutive
                              eigenvalues that are nearly equal. The scree graph differs from the formal
                              testing procedure in that it starts from the largest eigenvalue and com-
                              pares consecutive eigenvalues two at a time, whereas the tests start with
                              the smallest eigenvalues and compare blocks of two, three, four and so on.
                              Another difference is that the ‘elbow’ point is retained in Cattell’s formu-
                              lation of the scree graph, but excluded in the testing procedure. The scree
                              graph is also more subjective but, as has been stated above, the objectivity
                              of the testing procedure is something of an illusion.
                                Cattell’s original formulation of the scree graph differs from the above
                              since it is differences l k−1 − l k , rather than l k , which must be equal beyond
   145   146   147   148   149   150   151   152   153   154   155