Page 153 - Jolliffe I. Principal Component Analysis
P. 153

6. Choosing a Subset of Principal Components or Variables
                              122
                              for choosing m. To decide on whether to include the mth PC, Wold (1978)
                              examines the ratio
                                                           PRESS(m)
                                                                            .            (6.1.4)
                                               R =
                                                           j=1 (m−1) ˜x ij − x ij )
                                                             (
                                                      n    p               2
                                                      i=l
                              This compares the prediction error sum of squares after fitting m compo-
                              nents, with the sum of squared differences between observed and estimated
                              data points based on all the data, using (m − 1) components. If R< 1,
                              then the implication is that a better prediction is achieved using m rather
                              than (m − 1) PCs, so that the mth PC should be included.
                                The approach of Eastment and Krzanowski (1982) is similar to that in an
                              analysis of variance. The reduction in prediction (residual) sum of squares
                              in adding the mth PC to the model, divided by its degrees of freedom, is
                              compared to the prediction sum of squares after fitting m PCs, divided by
                              its degrees of freedom. Their criterion is thus
                                                 [PRESS(m − 1) − PRESS(m)]/ν m,1
                                            W =                                 ,        (6.1.5)
                                                         PRESS(m)/ν m,2
                              where ν m,1 , ν m,2 are the degrees of freedom associated with the numerator
                              and denominator, respectively. It is suggested that if W> 1, then inclusion
                              of the mth PC is worthwhile, although this cut-off at unity is to be inter-
                              preted with some flexibility. It is certainly not appropriate to stop adding
                              PCs as soon as (6.1.5) first falls below unity, because the criterion is not
                              necessarily a monotonic decreasing function of m. Because the ordering
                              of the population eigenvalues may not be the same as that of the sam-
                              ple eigenvalues, especially if consecutive eigenvalues are close, Krzanowski
                              (1987a) considers orders of the components different from those implied by
                              the sample eigenvalues. For the well-known alate adelges data set (see Sec-
                              tion 6.4), Krzanowski (1987a) retains components 1–4 in a straightforward
                              implementation of W, but he keeps only components 1,2,4 when reorder-
                              ings are allowed. In an example with a large number (100) of variables,
                              Krzanowski and Kline (1995) use W in the context of factor analysis and
                              simply take the number of components with W greater than a threshold,
                              regardless of their position in the ordering of eigenvalues, as an indicator of
                              the number of factors to retain. For example, the result where W exceeds
                              0.9 for components 1, 2, 4, 18 and no others is taken to indicate that a
                              4-factor solution is appropriate.
                                It should be noted that although the criteria described in this section
                              are somewhat less ad hoc than those of Sections 6.1.1–6.1.3, there is still
                              no real attempt to set up a formal significance test to decide on m. Some
                              progress has been made by Krzanowski (1983) in investigating the sam-
                              pling distribution of W using simulated data. He points out that there are
                              two sources of variability to be considered in constructing such a distri-
                              bution; namely the variability due to different sample covariance matrices
                              S for a fixed population covariance matrix Σ and the variability due to
   148   149   150   151   152   153   154   155   156   157   158