Page 151 - Jolliffe I. Principal Component Analysis
P. 151

6. Choosing a Subset of Principal Components or Variables
                              120
                              the cut-off point. In other words, in order to retain q PCs the last (p −
                              q) eigenvalues should have a linear trend. Bentler and Yuan (1996,1998)
                              develop procedures for testing in the case of covariance and correlation
                              matrices, respectively, the null hypothesis
                                             ∗
                                           H : λ q+k = α + βx k ,k =1, 2,... , (p − q)
                                             q
                              where α, β are non-negative constants and x k =(p − q) − k.
                                For covariance matrices a maximum likelihood ratio test (MLRT) can
                              be used straightforwardly, with the null distribution of the test statistic
                                                 2
                              approximated by a χ distribution. In the correlation case Bentler and
                              Yuan (1998) use simulations to compare the MLRT, treating the correlation
                                                                          2
                              matrix as a covariance matrix, with a minimum χ test. They show that
                              the MLRT has a seriously inflated Type I error, even for very large sample
                                                                 2
                              sizes. The properties of the minimum χ test are not ideal, but the test
                              gives plausible results in the examples examined by Bentler and Yuan.
                              They conclude that it is reliable for sample sizes of 100 or larger. The
                              discussion section of Bentler and Yuan (1998) speculates on improvements
                              for smaller sample sizes, on potential problems caused by possible different
                              orderings of eigenvalues in populations and samples, and on the possibility
                              of testing hypotheses for specific non-linear relationships among the last
                              (p − q) eigenvalues.
                                Ali et al. (1985) propose a method for choosing m basedontestinghy-
                              potheses for correlations between the variables and the components. Recall
                              from Section 2.3 that for a correlation matrix PCA and the normalization

                              ˜ α ˜ α k = λ k , the coefficients ˜α kj are precisely these correlations. Similarly,
                               k
                              the sample coefficients ˜a kj are correlations between the kth PC and the
                              jth variable in the sample. The normalization constraint means that the
                              coefficients will decrease on average as k increases. Ali et al. (1985) suggest
                              defining m as one fewer than the index of the first PC for which none of
                              these correlation coefficients is significantly different from zero at the 5%
                              significance level. However, there is one immediate difficulty with this sug-
                              gestion. For a fixed level of significance, the critical values for correlation
                              coefficients decrease in absolute value as the sample size n increases. Hence
                              for a given sample correlation matrix, the number of PCs retained depends
                              on n. More components will be kept as n increases.


                              6.1.5 Choice of m Using Cross-Validatory or
                                     Computationally Intensive Methods

                              The rule described in Section 6.1.1 is equivalent to looking at how well the
                              data matrix X is fitted by the rank m approximation based on the SVD.
                              The idea behind the first two methods discussed in the present section is
                              similar, except that each element x ij of X is now predicted from an equation
                              like the SVD, but based on a submatrix of X that does not include x ij .In
                              both methods, suggested by Wold (1978) and Eastment and Krzanowski
   146   147   148   149   150   151   152   153   154   155   156