Page 177 - Jolliffe I. Principal Component Analysis
P. 177

6. Choosing a Subset of Principal Components or Variables
                              146
                                        Table 6.4. Subsets of selected variables, Alate adelges.
                              (Each row corresponds to a selected subset with × denoting a selected variable.)
                                                                      Variables
                                                            5   8  9  11  13  14  17  18  19
                                McCabe, using criterion (a)
                                                best                   ×           ×       ×
                                              "
                                Three variables
                                                second best        ×   ×           ×
                                                best               ×   ×           ×       ×
                                              "
                                Four variables
                                                second best  ×     ×   ×                   ×
                                Jolliffe, using criteria B2, B4
                                                B2             ×       ×       ×
                                              "
                                Three variables
                                                B4                     ×   ×       ×
                                                B2          ××         ×       ×
                                             "
                                Four variables
                                                B4          ×          ×   ×       ×
                                Criterion (6.3.4)
                                Three variables                        ×   ×       ×
                                Four variables              ×          ×       ×       ×
                                Criterion (6.3.5)
                                Three variables             ×              ×           ×
                                Four variables              ×          ×   ×           ×


                              largest coefficients on five of the seven discrete variables, and the third PC
                              (3.9%) is almost completely dominated by one variable, number of antennal
                              spines. This variable, which is one of the two variables negatively correlated
                              with size, has a coefficient in the third PC that is five times as large as any
                              other variable.
                                Table 6.4 gives various subsets of variables selected by Jolliffe (1973)
                              and by McCabe (1982) in an earlier version of his 1984 paper that included
                              additional examples. The subsets given by McCabe (1982) are the best two
                              according to his criterion (a), whereas those from Jolliffe (1973) are selected
                              by the criteria B2 and B4 discussed above. Only the results for m =3 are
                              given in Jolliffe (1973), but Table 6.4 also gives results for m = 4 using his
                              methods. In addition, the table includes the ‘best’ 3- and 4-variable subsets
                              according to the criteria (6.3.4) and (6.3.5).
                                There is considerable overlap between the various subsets selected. In
                              particular, variable 11 is an almost universal choice and variables 5, 13 and
                              17 also appear in subsets selected by at least three of the four methods.
                              Conversely, variables {1–4, 6, 7, 10, 12, 15, 16} appear in none of subsets of
                              Table 6.4. It should be noted the variable 11 is ‘number of antennal spines,’
                              which, as discussed above, dominates the third PC. Variables 5 and 17, mea-
                              suring number of spiracles and number of ovipositor spines, respectively, are
   172   173   174   175   176   177   178   179   180   181   182