Page 148 - Jolliffe I. Principal Component Analysis
P. 148

6.1. How Many Principal Components?
                                                                                            117
                              which the scree graph defines a more-or-less straight line, not necessarily
                              horizontal. The first point on the straight line is then taken to be the last
                              factor/component to be retained. If there are two or more straight lines
                              formed by the lower eigenvalues, then the cut-off is taken at the upper (left-
                              hand) end of the left-most straight line. Cattell (1966) discusses at some
                              length whether the left-most point on the straight line should correspond
                              to the first excluded factor or the last factor to be retained. He concludes
                              that it is preferable to include this factor, although both variants are used
                              in practice.

                                The rule in Section 6.1.1 is based on t m =  k=1 k , the rule in Sec-
                                                                             l
                                                                          m
                              tion 6.1.2 looks at individual eigenvalues l k , and the current rule, as applied
                              to PCA, uses l k−1 − l k as its criterion. There is, however, no formal nu-
                              merical cut-off based on l k−1 − l k and, in fact, judgments of when l k−1 − l k
                              stops being large (steep) will depend on the relative values of l k−1 − l k
                              and l k − l k+1 ,aswellasthe absolute value of l k−1 − l k . Thus the rule is
                              based subjectively on the second, as well as the first, differences among
                              the l k . Because of this, it is difficult to write down a formal numerical rule
                              and the procedure has until recently remained purely graphical. Tests that
                              attempt to formalize the procedure, due to Bentler and Yuan (1996,1998),
                              are discussed in Section 6.1.4.

                                Cattell’s formulation, where we look for the point at which l k−1 − l k
                              becomes fairly constant for several subsequent values, is perhaps less sub-
                              jective, but still requires some degree of judgment. Both formulations of
                              the rule seem to work well in practice, provided that there is a fairly sharp
                              ‘elbow,’ or change of slope, in the graph. However, if the slope gradually
                              becomes less steep, with no clear elbow, as in Figure 6.1, then it is clearly
                              less easy to use the procedure.
                                A number of methods have been suggested in which the scree plot is
                              compared with a corresponding plot representing given percentiles, often a
                              95 percentile, of the distributions of each variance (eigenvalue) when PCA
                              is done on a ‘random’ matrix. Here ‘random’ usually refers to a correlation
                              matrix obtained from a random sample of n observations on p uncorrelated
                              normal random variables, where n, p are chosen to be the same as for the
                              data set of interest. A number of varieties of this approach, which goes
                              under the general heading parallel analysis, have been proposed in the
                              psychological literature. Parallel analysis dates back to Horn (1965), where
                              it was described as determining the number of factors in factor analysis.
                              Its ideas have since been applied, sometimes inappropriately, to PCA.
                                Most of its variants use simulation to construct the 95 percentiles em-
                              pirically, and some examine ‘significance’ of loadings (eigenvectors), as well
                              as eigenvalues, using similar reasoning. Franklin et al. (1995) cite many of
                              the most relevant references in attempting to popularize parallel analysis
                              amongst ecologists. The idea in versions of parallel analysis that concen-
                              trate on eigenvalues is to retain m PCs, where m is the largest integer for
                              which the scree graph lies above the graph of upper 95 percentiles. Boot-
   143   144   145   146   147   148   149   150   151   152   153