Page 143 - Jolliffe I. Principal Component Analysis
P. 143

6. Choosing a Subset of Principal Components or Variables
                              112
                              in a regression analysis, or a set of predictor variables in a discriminant
                              analysis, is a different type of problem as criteria external to x must be
                              considered. Variable selection in regression is the subject of Section 8.5. The
                              related problem of choosing which PCs to include in a regression analysis
                              or discriminant analysis is discussed in Sections 8.2, 9.1 respectively.
                              6.1 How Many Principal Components?

                              In this section we present a number of rules for deciding how many PCs
                              should be retained in order to account for most of the variation in x (or
                              in the standardized variables x in the case of a correlation matrix-based
                                                         ∗
                              PCA).
                                In some circumstances the last few, rather than the first few, PCs are of
                              interest, as was discussed in Section 3.4 (see also Sections 3.7, 6.3, 8.4, 8.6
                              and 10.1). In the present section, however, the traditional idea of trying
                              to reduce dimensionality by replacing the p variables by the first m PCs
                              (m<p) is adopted, and the possible virtues of the last few PCs are ignored.
                                The first three types of rule for choosing m, described in Sections 6.1.1–
                              6.1.3, are very much ad hoc rules-of-thumb, whose justification, despite
                              some attempts to put them on a more formal basis, is still mainly that
                              they are intuitively plausible and that they work in practice. Section 6.1.4
                              discusses rules based on formal tests of hypothesis. These make distribu-
                              tional assumptions that are often unrealistic, and they frequently seem to
                              retain more variables than are necessary in practice. In Sections 6.1.5, 6.1.6
                              a number of statistically based rules, most of which do not require distri-
                              butional assumptions, are described. Several use computationally intensive
                              methods such as cross-validation and bootstrapping. Some procedures that
                              have been suggested in the context of atmospheric science are presented
                              briefly in Section 6.1.7, and Section 6.1.8 provides some discussion of a
                              number of comparative studies, and a few comments on the relative merits
                              of various rules.


                              6.1.1 Cumulative Percentage of Total Variation
                              Perhaps the most obvious criterion for choosing m, which has already been
                              informally adopted in some of the examples of Chapters 4 and 5, is to
                              select a (cumulative) percentage of total variation which one desires that
                              the selected PCs contribute, say 80% or 90%. The required number of
                              PCs is then the smallest value of m for which this chosen percentage is
                              exceeded. It remains to define what is meant by ‘percentage of variation
                              accounted for by the first m PCs,’ but this poses no real problem. Principal
                              components are successively chosen to have the largest possible variance,
                                                                                 l
                              and the variance of the kth PC is l k . Furthermore,    k=1 k =    p j=1  s jj ,
                                                                              p
   138   139   140   141   142   143   144   145   146   147   148