Page 153 - Jolliffe I. Principal Component Analysis
P. 153
6. Choosing a Subset of Principal Components or Variables
122
for choosing m. To decide on whether to include the mth PC, Wold (1978)
examines the ratio
PRESS(m)
. (6.1.4)
R =
j=1 (m−1) ˜x ij − x ij )
(
n p 2
i=l
This compares the prediction error sum of squares after fitting m compo-
nents, with the sum of squared differences between observed and estimated
data points based on all the data, using (m − 1) components. If R< 1,
then the implication is that a better prediction is achieved using m rather
than (m − 1) PCs, so that the mth PC should be included.
The approach of Eastment and Krzanowski (1982) is similar to that in an
analysis of variance. The reduction in prediction (residual) sum of squares
in adding the mth PC to the model, divided by its degrees of freedom, is
compared to the prediction sum of squares after fitting m PCs, divided by
its degrees of freedom. Their criterion is thus
[PRESS(m − 1) − PRESS(m)]/ν m,1
W = , (6.1.5)
PRESS(m)/ν m,2
where ν m,1 , ν m,2 are the degrees of freedom associated with the numerator
and denominator, respectively. It is suggested that if W> 1, then inclusion
of the mth PC is worthwhile, although this cut-off at unity is to be inter-
preted with some flexibility. It is certainly not appropriate to stop adding
PCs as soon as (6.1.5) first falls below unity, because the criterion is not
necessarily a monotonic decreasing function of m. Because the ordering
of the population eigenvalues may not be the same as that of the sam-
ple eigenvalues, especially if consecutive eigenvalues are close, Krzanowski
(1987a) considers orders of the components different from those implied by
the sample eigenvalues. For the well-known alate adelges data set (see Sec-
tion 6.4), Krzanowski (1987a) retains components 1–4 in a straightforward
implementation of W, but he keeps only components 1,2,4 when reorder-
ings are allowed. In an example with a large number (100) of variables,
Krzanowski and Kline (1995) use W in the context of factor analysis and
simply take the number of components with W greater than a threshold,
regardless of their position in the ordering of eigenvalues, as an indicator of
the number of factors to retain. For example, the result where W exceeds
0.9 for components 1, 2, 4, 18 and no others is taken to indicate that a
4-factor solution is appropriate.
It should be noted that although the criteria described in this section
are somewhat less ad hoc than those of Sections 6.1.1–6.1.3, there is still
no real attempt to set up a formal significance test to decide on m. Some
progress has been made by Krzanowski (1983) in investigating the sam-
pling distribution of W using simulated data. He points out that there are
two sources of variability to be considered in constructing such a distri-
bution; namely the variability due to different sample covariance matrices
S for a fixed population covariance matrix Σ and the variability due to

