Page 170 - Jolliffe I. Principal Component Analysis
P. 170

6.3. Selecting a Subset of Variables
                                                                                            139
                              ‘best’ subsets more often than the other methods considered, but they also
                              selected ‘bad,’ as opposed to ‘good’ or ‘moderate’, subsets more frequently
                              than the other methods. Method B4 was most extreme in this respect; it
                              selected ‘best’ and ‘bad’ subsets more frequently than any other method,
                              and ‘moderate’ or ‘good’ subsets less frequently.
                                Similarly, for various real data sets Jolliffe (1973) found that none of
                              the variable selection methods was uniformly best, but several of them,
                              including B2 and B4, found reasonable subsets in most cases.
                                McCabe (1984) adopted a somewhat different approach to the variable
                              selection problem. He started from the fact that, as has been seen in Chap-
                              ters 2 and 3, PCs satisfy a number of different optimality criteria. A subset
                              of the original variables that optimizes one of these criteria is termed a set
                              of principal variables by McCabe (1984). Property A1 of Sections 2.1, 3.1,
                              is uninteresting as it simply leads to a subset of variables whose variances
                              are largest, but other properties lead to one of these four criteria:
                                            ∗
                                           m

                              (a) Minimize   θ j
                                          j=1
                                            ∗
                                           m

                              (b) Minimize    θ j
                                          j=1
                                            ∗
                                           m
                                               2
                              (c) Minimize    θ
                                               j
                                          j=1
                                            −
                                           m
                                               2
                              (d) Minimize    ρ
                                               j
                                          j=1
                              where θ j ,j =1, 2,... ,m are the eigenvalues of the conditional covariance
                                                   ∗
                                                           ∗
                              (or correlation) matrix of the m deleted variables, given the values of
                              the m selected variables, and ρ j ,j =1, 2,... ,m −  = min(m, m )arethe
                                                                                      ∗
                              canonical correlations between the set of m deleted variables and the set
                                                                    ∗
                              of m selected variables.
                                Consider, for example, Property A4 of Sections 2.1 and 3.1, where
                              det(Σ y ) (or det(S y ) for samples) is to be maximized. In PCA, y consists
                              of orthonormal linear functions of x; for principal variables y is a subset of
                              x.
                                From a well-known result concerning partitioned matrices, det(Σ)=
                              det(Σ y ) det(Σ y · y ), where Σ y · y is the matrix of conditional covariances for
                                                       ∗
                                           ∗
                              those variables not in y, given the value of y. Because Σ, and hence det(Σ),
                              is fixed for a given random vector x, maximizing det(Σ y ) is equivalent to
                                                                       ∗

                              minimizing det(Σ y · y ). Now det(Σ y · y )=  m  θ j ,sothatPropertyA4
                                              ∗
                                                             ∗
                                                                     j=1
                              becomes McCabe’s criterion (a) when deriving principal variables. Other
                              properties of Chapters 2 and 3 can similarly be shown to be equivalent to
   165   166   167   168   169   170   171   172   173   174   175