Page 157 - Jolliffe I. Principal Component Analysis
P. 157
6. Choosing a Subset of Principal Components or Variables
126
estimates depend on the reciprocal of the difference between l m and l m+1
where, as before, m is the number of PCs retained. The usual implemen-
tations of the rules of Sections 6.1.1, 6.1.2 ignore the size of gaps between
eigenvalues and hence do not take stability into account. However, it is ad-
visable when using Kaiser’s rule or one of its modifications, or a rule based
on cumulative variance, to treat the threshold with flexibility, and be pre-
pared to move it, if it does not correspond to a good-sized gap between
eigenvalues.
Besse and de Falguerolles (1993) also examine a real data set with p =16
and n = 60. Kaiser’s rule chooses m = 5, and the scree graph suggests either
m =3 or m = 5. The bootstrap and jackknife criteria behave similarly to
each other. Ignoring the uninteresting minimum at m = 1, all four methods
choose m = 3, although there are strong secondary minima at m =8 and
m =5.
Another model-based rule is introduced by Bishop (1999) and, even
though one of its merits is said to be that it avoids cross-validation, it
seems appropriate to mention it here. Bishop (1999) proposes a Bayesian
framework for Tipping and Bishop’s (1999a) model, which was described in
Section 3.9. Recall that under this model the covariance matrix underlying
2
the data can be written as BB + σ I p , where B is a (p × q) matrix. The
prior distribution of B in Bishop’s (1999) framework allows B to have its
maximum possible value of q (= p − 1) under the model. However if the
posterior distribution assigns small values for all elements of a column b k of
B, then that dimension is removed. The mode of the posterior distribution
can be found using the EM algorithm.
Jackson (1993) discusses two bootstrap versions of ‘parallel analysis,’
which was described in general terms in Section 6.1.3. The first, which
is a modification of Kaiser’s rule defined in Section 6.1.2, uses bootstrap
samples from a data set to construct confidence limits for the popula-
tion eigenvalues (see Section 3.7.2). Only those components for which the
corresponding 95% confidence interval lies entirely above 1 are retained.
Unfortunately, although this criterion is reasonable as a means of deciding
the number of factors in a factor analysis (see Chapter 7), it is inappropri-
ate in PCA. This is because it will not retain PCs dominated by a single
variable whose correlations with all the other variables are close to zero.
Such variables are generally omitted from a factor model, but they provide
information not available from other variables and so should be retained if
most of the information in X is to be kept. Jolliffe’s (1972) suggestion of
reducing Kaiser’s threshold from 1 to around 0.7 reflects the fact that we
are dealing with PCA and not factor analysis. A bootstrap rule designed
with PCA in mind would retain all those components for which the 95%
confidence interval for the corresponding eigenvalue does not lie entirely
below 1.
A second bootstrap approach suggested by Jackson (1993) finds 95%
confidence intervals for both eigenvalues and eigenvector coefficients. To

