Page 169 - Jolliffe I. Principal Component Analysis
P. 169

138
                                    6. Choosing a Subset of Principal Components or Variables
                              methods, including some based on cluster analyses of variables (see Sec-
                              tion 9.2) were also examined but, as these do not use the PCs to select
                              variables, they are not described here. Three main types of method using
                              PCs were examined.
                                                                           ∗
                              (i) Associate one variable with each of the last m (= p − m 1 ) PCs and
                                                                           1
                                 delete those m variables. This can either be done once only or iter-
                                              ∗
                                              1
                                 atively. In the latter case a second PCA is performed on the m 1
                                 remaining variables, and a further set of m variables is deleted, if ap-
                                                                       ∗
                                                                       2
                                 propriate. A third PCA can then be done on the p−m −m variables,
                                                                                ∗
                                                                                     ∗
                                                                                1    2
                                 and the procedure is repeated until no further deletions are considered
                                 necessary. The choice of m ,m ,... is based on a criterion determined
                                                            ∗
                                                        ∗
                                                        1   2
                                 by the size of the eigenvalues l k .
                                 The reasoning behind this method is that small eigenvalues correspond
                                 to near-constant relationships among a subset of variables. If one of
                                 the variables involved in such a relationship is deleted (a fairly obvious
                                 choice for deletion is the variable with the highest coefficient in abso-
                                 lute value in the relevant PC) little information is lost. To decide on
                                 how many variables to delete, the criterion l k is used as described in
                                 Section 6.1.2. The criterion t m of Section 6.1.1 was also tried by Jolliffe
                                 (1972), but shown to be less useful.
                                                    ∗
                              (ii) Associate a set of m variables en bloc with the last m PCs, and
                                                                                    ∗
                                 then delete these variables. Jolliffe (1970, 1972) investigated this type
                                 of method, with the m variables either chosen to maximize sums of
                                                      ∗
                                                                                     ∗
                                 squares of coefficients in the last m PCs or to be those m variables
                                                                ∗
                                                                                        ∗
                                 that are best predicted by regression on the first m = p − m PCs.
                                             ∗
                                 Choice of m is again based on the sizes of the l k . Such methods
                                 were found to be unsatisfactory, as they consistently failed to select
                                 an appropriate subset for some simple correlation structures.
                             (iii) Associate one variable with each of the first m PCs, namely the variable
                                 not already chosen with the highest coefficient in absolute value in
                                 each successive PC. These m variables are retained, and the remaining
                                 m = p − m are deleted. The arguments leading to this approach are
                                   ∗
                                 twofold. First, it is an obvious complementary approach to (i) and,
                                 second, in cases where there are groups of highly correlated variables it
                                 is designed to select just one variable from each group. This will happen
                                 because there will be exactly one high-variance PC associated with each
                                 group (see Section 3.8). The approach is a plausible one, as a single
                                 variable from each group should preserve most of the information given
                                 by that group when all variables in the group are highly correlated.
                                In Jolliffe (1972) comparisons were made, using simulated data, between
                              non-iterative versions of method (i) and method (iii), called methods B2, B4
                              respectively, and with several other subset selection methods that did not
                              use the PCs. The results showed that the PC methods B2, B4 retained the
   164   165   166   167   168   169   170   171   172   173   174