Page 176 - Jolliffe I. Principal Component Analysis
P. 176

6.4. Examples Illustrating Variable Selection
                                                                                            145
                              examples given by Tanaka and Mori (1997) the decision on when to stop
                              deleting variables appears to be rather subjective.
                                Mori et al. (1999) propose that the subsets selected in modified PCA
                              are also assessed by means of a PRESS criterion, similar to that defined in
                              equation (6.1.3), except that m ˜x ij is replaced by the prediction of x ij found
                              from modified PCA with the ith observation omitted. Mori et al. (2000)
                              demonstrate a procedure in which the PRESS citerion is used directly to
                              select variables, rather than as a supplement to another criterion. Tanaka
                              and Mori (1997) show how to evaluate the influence of variables on param-
                              eters in a PCA (see Section 10.2 for more on influence), and Mori et al.
                              (2000) implement and illustrate a backward-elimination variable selection
                              algorithm in which variables with the smallest influence are successively
                              removed.
                                Hawkins and Eplett (1982) describe a method which can be used for
                              selecting a subset of variables in regression; their technique and an ear-
                              lier one introduced by Hawkins (1973) are discussed in Sections 8.4 and
                              8.5. Hawkins and Eplett (1982) note that their method is also potentially
                              useful for selecting a subset of variables in situations other than multiple
                              regression, but, as with the RV-coefficient, no numerical example is given
                              in the original paper. Krzanowski (1987a,b) describes a methodology, us-
                              ing principal components together with Procrustes rotation for selecting
                              subsets of variables. As his main objective is preserving ‘structure’ such as
                              groups in the data, we postpone detailed discussion of his technique until
                              Section 9.2.2.



                              6.4 Examples Illustrating Variable Selection

                              Two examples are presented here; two other relevant examples are given in
                              Section 8.7.


                              6.4.1 Alate adelges (Winged Aphids)

                              These data were first presented by Jeffers (1967) and comprise 19 different
                              variables measured on 40 winged aphids. A description of the variables,
                              together with the correlation matrix and the coefficients of the first four
                              PCs based on the correlation matrix, is given by Jeffers (1967) and will
                              not be reproduced here. For 17 of the 19 variables all of the correlation
                              coefficients are positive, reflecting the fact that 12 variables are lengths
                              or breadths of parts of each individual, and some of the other (discrete)
                              variables also measure aspects of the size of each aphid. Not surprisingly,
                              the first PC based on the correlation matrix accounts for a large proportion
                              (73.0%) of the total variation, and this PC is a measure of overall size of
                              each aphid. The second PC, accounting for 12.5% of total variation, has its
   171   172   173   174   175   176   177   178   179   180   181