Page 154 - Jolliffe I. Principal Component Analysis
P. 154

6.1. How Many Principal Components?
                                                                                            123
                              the fact that a fixed sample covariance matrix S can result from different
                              data matrices X. In addition to this two-tiered variability, there are many
                              parameters that can vary: n, p, and particularly the structure of Σ.This
                              means that simulation studies can only examine a fraction of the possible
                              parameter values, and are therefore of restricted applicability. Krzanowski
                              (1983) looks at several different types of structure for Σ, and reaches the
                              conclusion that W chooses about the right number of PCs in each case, al-
                              though there is a tendency for m to be too small. Wold (1978) also found,
                              in a small simulation study, that R retains too few PCs. This underestima-
                              tion for m can clearly be overcome by moving the cut-offs for W and R,
                              respectively, slightly below and slightly above unity. Although the cut-offs
                              at R =1 and W = 1 seem sensible, the reasoning behind them is not rigid,
                              and they could be modified slightly to account for sampling variation in the
                              same way that Kaiser’s rule (Section 6.1.2) seems to work better when l ∗
                              is changed to a value somewhat below unity. In later papers (Krzanowski,
                              1987a; Krzanowski and Kline, 1995) a threshold for W of 0.9 is used.
                                Krzanowski and Kline (1995) investigate the use of W in the context of
                              factor analysis, and compare the properties and behaviour of W with three
                              other criteria derived from PRESS(m). Criterion P is based on the ratio
                                                  (PRESS(1) − PRESS(m))
                                                                         ,
                                                         PRESS(m)
                               ∗
                              P on
                                                  (PRESS(0) − PRESS(m))
                                                                         ,
                                                         PRESS(m)
                              and R (different from Wold’s R)on
                                                (PRESS(m − 1) − PRESS(m))
                                                                             .
                                               (PRESS(m − 1) − PRESS(m + 1))
                              In each case the numerator and denominator of the ratio are divided by
                              appropriate degrees of freedom, and in each case the value of m for which
                              the criterion is largest gives the number of factors to be retained. On the
                              basis of two previously analysed psychological examples, Krzanowski and
                                                             ∗
                              Kline (1995) conclude that W and P select appropriate numbers of factors,
                              whereas P and R are erratic and unreliable. As discussed later in this
                              section, selection in factor analysis needs rather different considerations
                              from PCA. Hence a method that chooses the ‘right number’ of factors may
                              select too few PCs.
                                Cross-validation of PCs is computationally expensive for large data sets.
                              Mertens et al. (1995) describe efficient algorithms for cross-validation, with
                              applications to principal component regression (see Chapter 8) and in the
                              investigation of influential observations (Section 10.2). Besse and Ferr´e
                              (1993) raise doubts about whether the computational costs of criteria based
                              on PRESS(m) are worthwhile. Using Taylor expansions, they show that
   149   150   151   152   153   154   155   156   157   158   159