Page 143 - Jolliffe I. Principal Component Analysis

P. 143

6. Choosing a Subset of Principal Components or Variables
112
in a regression analysis, or a set of predictor variables in a discriminant
analysis, is a diﬀerent type of problem as criteria external to x must be
considered. Variable selection in regression is the subject of Section 8.5. The
related problem of choosing which PCs to include in a regression analysis
or discriminant analysis is discussed in Sections 8.2, 9.1 respectively.
6.1 How Many Principal Components?

In this section we present a number of rules for deciding how many PCs
should be retained in order to account for most of the variation in x (or
in the standardized variables x in the case of a correlation matrix-based
∗
PCA).
In some circumstances the last few, rather than the first few, PCs are of
interest, as was discussed in Section 3.4 (see also Sections 3.7, 6.3, 8.4, 8.6
and 10.1). In the present section, however, the traditional idea of trying
to reduce dimensionality by replacing the p variables by the first m PCs
(m<p) is adopted, and the possible virtues of the last few PCs are ignored.
The first three types of rule for choosing m, described in Sections 6.1.1–
6.1.3, are very much ad hoc rules-of-thumb, whose justification, despite
some attempts to put them on a more formal basis, is still mainly that
they are intuitively plausible and that they work in practice. Section 6.1.4
discusses rules based on formal tests of hypothesis. These make distribu-
tional assumptions that are often unrealistic, and they frequently seem to
retain more variables than are necessary in practice. In Sections 6.1.5, 6.1.6
a number of statistically based rules, most of which do not require distri-
butional assumptions, are described. Several use computationally intensive
methods such as cross-validation and bootstrapping. Some procedures that
have been suggested in the context of atmospheric science are presented
briefly in Section 6.1.7, and Section 6.1.8 provides some discussion of a
number of comparative studies, and a few comments on the relative merits
of various rules.

6.1.1 Cumulative Percentage of Total Variation
Perhaps the most obvious criterion for choosing m, which has already been
informally adopted in some of the examples of Chapters 4 and 5, is to
select a (cumulative) percentage of total variation which one desires that
the selected PCs contribute, say 80% or 90%. The required number of
PCs is then the smallest value of m for which this chosen percentage is
exceeded. It remains to deﬁne what is meant by ‘percentage of variation
accounted for by the ﬁrst m PCs,’ but this poses no real problem. Principal
components are successively chosen to have the largest possible variance,
l
and the variance of the kth PC is l k . Furthermore, k=1 k = p j=1 s jj ,
p

138 139 140 141 142 143 144 145 146 147 148