Page 151 - Jolliffe I. Principal Component Analysis
P. 151
6. Choosing a Subset of Principal Components or Variables
120
the cut-off point. In other words, in order to retain q PCs the last (p −
q) eigenvalues should have a linear trend. Bentler and Yuan (1996,1998)
develop procedures for testing in the case of covariance and correlation
matrices, respectively, the null hypothesis
∗
H : λ q+k = α + βx k ,k =1, 2,... , (p − q)
q
where α, β are non-negative constants and x k =(p − q) − k.
For covariance matrices a maximum likelihood ratio test (MLRT) can
be used straightforwardly, with the null distribution of the test statistic
2
approximated by a χ distribution. In the correlation case Bentler and
Yuan (1998) use simulations to compare the MLRT, treating the correlation
2
matrix as a covariance matrix, with a minimum χ test. They show that
the MLRT has a seriously inflated Type I error, even for very large sample
2
sizes. The properties of the minimum χ test are not ideal, but the test
gives plausible results in the examples examined by Bentler and Yuan.
They conclude that it is reliable for sample sizes of 100 or larger. The
discussion section of Bentler and Yuan (1998) speculates on improvements
for smaller sample sizes, on potential problems caused by possible different
orderings of eigenvalues in populations and samples, and on the possibility
of testing hypotheses for specific non-linear relationships among the last
(p − q) eigenvalues.
Ali et al. (1985) propose a method for choosing m basedontestinghy-
potheses for correlations between the variables and the components. Recall
from Section 2.3 that for a correlation matrix PCA and the normalization
˜ α ˜ α k = λ k , the coefficients ˜α kj are precisely these correlations. Similarly,
k
the sample coefficients ˜a kj are correlations between the kth PC and the
jth variable in the sample. The normalization constraint means that the
coefficients will decrease on average as k increases. Ali et al. (1985) suggest
defining m as one fewer than the index of the first PC for which none of
these correlation coefficients is significantly different from zero at the 5%
significance level. However, there is one immediate difficulty with this sug-
gestion. For a fixed level of significance, the critical values for correlation
coefficients decrease in absolute value as the sample size n increases. Hence
for a given sample correlation matrix, the number of PCs retained depends
on n. More components will be kept as n increases.
6.1.5 Choice of m Using Cross-Validatory or
Computationally Intensive Methods
The rule described in Section 6.1.1 is equivalent to looking at how well the
data matrix X is fitted by the rank m approximation based on the SVD.
The idea behind the first two methods discussed in the present section is
similar, except that each element x ij of X is now predicted from an equation
like the SVD, but based on a submatrix of X that does not include x ij .In
both methods, suggested by Wold (1978) and Eastment and Krzanowski

