Page 145 - Jolliffe I. Principal Component Analysis
P. 145
6. Choosing a Subset of Principal Components or Variables
114
ance. Mandel’s results are based on simulation studies, and although exact
results have been produced by some authors, they are only for limited spe-
cial cases. For example, Krzanowski (1979a) gives exact results for m =1
and p = 3 or 4, again under the assumptions of normality, independence
and equal variances for all variables. These assumptions mean that the
results can be used to determine whether or not all variables are indepen-
dent, but are of little general use in determining an ‘optimal’ cut-off for
t m . Sugiyama and Tong (1976) describe an approximate distribution for t m
which does not assume independence or equal variances, and which can be
used to test whether l 1 ,l 2 ,...,l m are compatible with any given structure
for λ 1 ,λ 2 ,...,λ m , the corresponding population variances. However, the
test still assumes normality and it is only approximate, so it is not clear
how useful it is in practice for choosing an appropriate value of m.
Huang and Tseng (1992) describe a ‘decision procedure for determining
the number of components’ based on t m . Given a proportion of population
variance τ, which one wishes to retain, and the true minimum number of
population PCs m τ that achieves this, Huang and Tseng (1992) develop
∗
a procedure for finding a sample size n and a threshold t having a pre-
scribed high probability of choosing m = m τ . It is difficult to envisage
circumstances where this would be of practical value.
l
A number of other criteria based on k=m+1 k are discussed briefly by
p
Jackson (1991, Section 2.8.11). In situations where some desired residual
variation can be specified, as sometimes happens for example in quality
control (see Section 13.7), Jackson (1991, Section 2.8.5) advocates choosing
m such that the absolute, rather than percentage, value of k=m+1 k first
l
p
falls below the chosen threshold.
6.1.2 Size of Variances of Principal Components
The previous rule is equally valid whether a covariance or a correlation
matrix is used to compute the PCs. The rule described in this section is
constructed specifically for use with correlation matrices, although it can
be adapted for some types of covariance matrices. The idea behind the
rule is that if all elements of x are independent, then the PCs are the
same as the original variables and all have unit variances in the case of
a correlation matrix. Thus any PC with variance less than 1 contains less
information than one of the original variables and so is not worth retaining.
The rule, in its simplest form, is sometimes called Kaiser’s rule (Kaiser,
1960) and retains only those PCs whose variances l k exceed 1. If the data
set contains groups of variables having large within-group correlations, but
small between group correlations, then there is one PC associated with each
group whose variance is > 1, whereas any other PCs associated with the
group have variances < 1 (see Section 3.8). Thus, the rule will generally
retain one, and only one, PC associated with each such group of variables,
which seems to be a reasonable course of action for data of this type.

