Page 163 - Jolliffe I. Principal Component Analysis
P. 163

6. Choosing a Subset of Principal Components or Variables
                              132
                              second and third largest eigenvalues of the fixed matrix Z, so he argues
                              that m = 2 should be chosen. This implies a movement away from the
                              objective ‘correct’ choice given by the model, back towards what seems to
                              be the inevitable subjectivity of the area.
                                The simulations are replicated 100 times for each of the two noise levels,
                              and give results which are consistent with other studies. Kaiser’s modified
                              rule with a threshold at 2, the broken stick rule, Velicer’s test, and cross-
                              validation rules that stop after the first fall below the threshold—all retain
                              relatively few components. Conversely, Bartlett’s test, cumulative variance
                                                   ˆ
                              with a cut-off of 90%, f q and the approximate jackknife retain greater
                              numbers of PCs. The approximate jackknife displays the strange behaviour
                              of retaining more PCs for larger than for smaller noise levels. If we consider
                              m = 8 to be ‘correct’ for both noise levels, all rules behave poorly for the
                                                                   ˆ
                              high noise level. For the low noise level, f q and Bartlett’s tests do best.
                              If m = 2 is deemed correct for the high noise level, the best procedures
                              are Kaiser’s modified rule with threshold 2, the scree graph, and all four
                              varieties of cross-validation. Even within this restricted study no rule is
                              consistently good.
                                Bartkowiak (1991) gives an empirical comparison for some meteorologi-
                              cal data of: subjective rules based on cumulative variance and on the scree
                              and LEV diagrams; the rule based on eigenvalues greater than 1 or 0.7; the
                              broken stick rule; Velicer’s criterion. Most of the rules lead to similar deci-
                              sions, except for the broken stick rule, which retains too few components,
                              and the LEV diagram, which is impossible to interpret unambiguously.
                              The conclusion for the broken stick rule is the opposite of that in Jackson’s
                              (1993) study.
                                Throughout our discussion of rules for choosing m we have empha-
                              sized the descriptive rˆole of PCA and contrasted it with the model-based
                              approach of factor analysis. It is usually the case that the number of compo-
                              nents needed to achieve the objectives of PCA is greater than the number
                              of factors in a factor analysis of the same data. However, this need not
                              be the case when a model-based approach is adopted for PCA (see Sec-
                              tions 3.9, 6.1.5). As Heo and Gabriel (2001) note in the context of biplots
                              (see Section 5.3), the fit of the first few PCs to an underlying population
                              pattern (model) may be much better than their fit to a sample. This im-
                              plies that a smaller value of m may be appropriate for model-based PCA
                              than for descriptive purposes. In other circumstances, too, fewer PCs may
                              be sufficient for the objectives of the analysis. For example, in atmospheric
                              science, where p can be very large, interest may be restricted only to the
                              first few dominant and physically interpretable patterns of variation, even
                              though their number is fewer than that associated with most PCA-based
                              rules. Conversely, sometimes very dominant PCs are predictable and hence
                              of less interest than the next few. In such cases more PCs will be retained
                              than indicated by most rules. The main message is that different objec-
                              tives for a PCA lead to different requirements concerning how many PCs
   158   159   160   161   162   163   164   165   166   167   168