Page 146 - Jolliffe I. Principal Component Analysis
P. 146

115
                                                        6.1. How Many Principal Components?
                                As well as these intuitive justifications, Kaiser (1960) put forward a num-
                              ber of other reasons for a cut-off at l k = 1. It must be noted, however, that
                              most of the reasons are pertinent to factor analysis (see Chapter 7), rather
                              than PCA, although Kaiser refers to PCs in discussing one of them.
                                It can be argued that a cut-off at l k = 1 retains too few variables. Con-
                              sider a variable which, in the population, is more-or-less independent of
                              all other variables. In a sample, such a variable will have small coefficients
                              in (p − 1) of the PCs but will dominate one of the PCs, whose variance
                              l k will be close to 1 when using the correlation matrix. As the variable
                              provides independent information from the other variables it would be un-
                              wise to delete it. However, deletion will occur if Kaiser’s rule is used, and
                              if, due to sampling variation, l k < 1. It is therefore advisable to choose
                                       ∗
                              a cut-off l lower than 1, to allow for sampling variation. Jolliffe (1972)
                              suggested, based on simulation studies, that l =0.7 is roughly the correct
                                                                     ∗
                              level. Further discussion of this cut-off level will be given with respect to
                              examples in Sections 6.2 and 6.4.
                                The rule just described is specifically designed for correlation matrices,
                              but it can be easily adapted for covariance matrices by taking as a cut-off l ∗
                                              ¯
                              the average value l of the eigenvalues or, better, a somewhat lower cut-off
                                            ¯
                              such as l =0.7l. For covariance matrices with widely differing variances,
                                     ∗
                              however, this rule and the one based on t k from Section 6.1.1 retain very
                              few (arguably, too few) PCs, as will be seen in the examples of Section 6.2.
                                An alternative way of looking at the sizes of individual variances is to use
                              the so-called broken stick model. If we have a stick of unit length, broken
                              at random into p segments, then it can be shown that the expected length
                              of the kth longest segment is
                                                             1     1
                                                               p
                                                         ∗
                                                        l =        .
                                                             p    j
                                                         k
                                                              j=k
                              One way of deciding whether the proportion of variance accounted for by
                              the kth PC is large enough for that component to be retained is to compare
                              the proportion with l . Principal components for which the proportion
                                                 ∗
                                                 k
                                                                                          ∗
                              exceeds l are then retained, and all other PCs deleted. Tables of l are
                                      ∗
                                      k                                                   k
                              available for various values of p and k (see, for example, Legendre and
                              Legendre (1983, p. 406)).
                              6.1.3 The Scree Graph and the Log-Eigenvalue Diagram
                              The first two rules described above usually involve a degree of subjectiv-
                                                                  ∗
                              ity in the choice of cut-off levels, t and l respectively. The scree graph,
                                                            ∗
                              which was discussed and named by Cattell (1966) but which was already
                              in common use, is even more subjective in its usual form, as it involves
                              looking at a plot of l k against k (see Figure 6.1, which is discussed in detail
                              in Section 6.2) and deciding at which value of k the slopes of lines joining
   141   142   143   144   145   146   147   148   149   150   151