Page 154 - Jolliffe I. Principal Component Analysis
P. 154
6.1. How Many Principal Components?
123
the fact that a fixed sample covariance matrix S can result from different
data matrices X. In addition to this two-tiered variability, there are many
parameters that can vary: n, p, and particularly the structure of Σ.This
means that simulation studies can only examine a fraction of the possible
parameter values, and are therefore of restricted applicability. Krzanowski
(1983) looks at several different types of structure for Σ, and reaches the
conclusion that W chooses about the right number of PCs in each case, al-
though there is a tendency for m to be too small. Wold (1978) also found,
in a small simulation study, that R retains too few PCs. This underestima-
tion for m can clearly be overcome by moving the cut-offs for W and R,
respectively, slightly below and slightly above unity. Although the cut-offs
at R =1 and W = 1 seem sensible, the reasoning behind them is not rigid,
and they could be modified slightly to account for sampling variation in the
same way that Kaiser’s rule (Section 6.1.2) seems to work better when l ∗
is changed to a value somewhat below unity. In later papers (Krzanowski,
1987a; Krzanowski and Kline, 1995) a threshold for W of 0.9 is used.
Krzanowski and Kline (1995) investigate the use of W in the context of
factor analysis, and compare the properties and behaviour of W with three
other criteria derived from PRESS(m). Criterion P is based on the ratio
(PRESS(1) − PRESS(m))
,
PRESS(m)
∗
P on
(PRESS(0) − PRESS(m))
,
PRESS(m)
and R (different from Wold’s R)on
(PRESS(m − 1) − PRESS(m))
.
(PRESS(m − 1) − PRESS(m + 1))
In each case the numerator and denominator of the ratio are divided by
appropriate degrees of freedom, and in each case the value of m for which
the criterion is largest gives the number of factors to be retained. On the
basis of two previously analysed psychological examples, Krzanowski and
∗
Kline (1995) conclude that W and P select appropriate numbers of factors,
whereas P and R are erratic and unreliable. As discussed later in this
section, selection in factor analysis needs rather different considerations
from PCA. Hence a method that chooses the ‘right number’ of factors may
select too few PCs.
Cross-validation of PCs is computationally expensive for large data sets.
Mertens et al. (1995) describe efficient algorithms for cross-validation, with
applications to principal component regression (see Chapter 8) and in the
investigation of influential observations (Section 10.2). Besse and Ferr´e
(1993) raise doubts about whether the computational costs of criteria based
on PRESS(m) are worthwhile. Using Taylor expansions, they show that

