Page 170 - Jolliffe I. Principal Component Analysis
P. 170
6.3. Selecting a Subset of Variables
139
‘best’ subsets more often than the other methods considered, but they also
selected ‘bad,’ as opposed to ‘good’ or ‘moderate’, subsets more frequently
than the other methods. Method B4 was most extreme in this respect; it
selected ‘best’ and ‘bad’ subsets more frequently than any other method,
and ‘moderate’ or ‘good’ subsets less frequently.
Similarly, for various real data sets Jolliffe (1973) found that none of
the variable selection methods was uniformly best, but several of them,
including B2 and B4, found reasonable subsets in most cases.
McCabe (1984) adopted a somewhat different approach to the variable
selection problem. He started from the fact that, as has been seen in Chap-
ters 2 and 3, PCs satisfy a number of different optimality criteria. A subset
of the original variables that optimizes one of these criteria is termed a set
of principal variables by McCabe (1984). Property A1 of Sections 2.1, 3.1,
is uninteresting as it simply leads to a subset of variables whose variances
are largest, but other properties lead to one of these four criteria:
∗
m
(a) Minimize θ j
j=1
∗
m
(b) Minimize θ j
j=1
∗
m
2
(c) Minimize θ
j
j=1
−
m
2
(d) Minimize ρ
j
j=1
where θ j ,j =1, 2,... ,m are the eigenvalues of the conditional covariance
∗
∗
(or correlation) matrix of the m deleted variables, given the values of
the m selected variables, and ρ j ,j =1, 2,... ,m − = min(m, m )arethe
∗
canonical correlations between the set of m deleted variables and the set
∗
of m selected variables.
Consider, for example, Property A4 of Sections 2.1 and 3.1, where
det(Σ y ) (or det(S y ) for samples) is to be maximized. In PCA, y consists
of orthonormal linear functions of x; for principal variables y is a subset of
x.
From a well-known result concerning partitioned matrices, det(Σ)=
det(Σ y ) det(Σ y · y ), where Σ y · y is the matrix of conditional covariances for
∗
∗
those variables not in y, given the value of y. Because Σ, and hence det(Σ),
is fixed for a given random vector x, maximizing det(Σ y ) is equivalent to
∗
minimizing det(Σ y · y ). Now det(Σ y · y )= m θ j ,sothatPropertyA4
∗
∗
j=1
becomes McCabe’s criterion (a) when deriving principal variables. Other
properties of Chapters 2 and 3 can similarly be shown to be equivalent to

