Page 157 - Jolliffe I. Principal Component Analysis
P. 157

6. Choosing a Subset of Principal Components or Variables
                              126
                              estimates depend on the reciprocal of the difference between l m and l m+1
                              where, as before, m is the number of PCs retained. The usual implemen-
                              tations of the rules of Sections 6.1.1, 6.1.2 ignore the size of gaps between
                              eigenvalues and hence do not take stability into account. However, it is ad-
                              visable when using Kaiser’s rule or one of its modifications, or a rule based
                              on cumulative variance, to treat the threshold with flexibility, and be pre-
                              pared to move it, if it does not correspond to a good-sized gap between
                              eigenvalues.
                                Besse and de Falguerolles (1993) also examine a real data set with p =16
                              and n = 60. Kaiser’s rule chooses m = 5, and the scree graph suggests either
                              m =3 or m = 5. The bootstrap and jackknife criteria behave similarly to
                              each other. Ignoring the uninteresting minimum at m = 1, all four methods
                              choose m = 3, although there are strong secondary minima at m =8 and
                              m =5.
                                Another model-based rule is introduced by Bishop (1999) and, even
                              though one of its merits is said to be that it avoids cross-validation, it
                              seems appropriate to mention it here. Bishop (1999) proposes a Bayesian
                              framework for Tipping and Bishop’s (1999a) model, which was described in
                              Section 3.9. Recall that under this model the covariance matrix underlying
                                                             2

                              the data can be written as BB + σ I p , where B is a (p × q) matrix. The
                              prior distribution of B in Bishop’s (1999) framework allows B to have its
                              maximum possible value of q (= p − 1) under the model. However if the
                              posterior distribution assigns small values for all elements of a column b k of
                              B, then that dimension is removed. The mode of the posterior distribution
                              can be found using the EM algorithm.
                                Jackson (1993) discusses two bootstrap versions of ‘parallel analysis,’
                              which was described in general terms in Section 6.1.3. The first, which
                              is a modification of Kaiser’s rule defined in Section 6.1.2, uses bootstrap
                              samples from a data set to construct confidence limits for the popula-
                              tion eigenvalues (see Section 3.7.2). Only those components for which the
                              corresponding 95% confidence interval lies entirely above 1 are retained.
                              Unfortunately, although this criterion is reasonable as a means of deciding
                              the number of factors in a factor analysis (see Chapter 7), it is inappropri-
                              ate in PCA. This is because it will not retain PCs dominated by a single
                              variable whose correlations with all the other variables are close to zero.
                              Such variables are generally omitted from a factor model, but they provide
                              information not available from other variables and so should be retained if
                              most of the information in X is to be kept. Jolliffe’s (1972) suggestion of
                              reducing Kaiser’s threshold from 1 to around 0.7 reflects the fact that we
                              are dealing with PCA and not factor analysis. A bootstrap rule designed
                              with PCA in mind would retain all those components for which the 95%
                              confidence interval for the corresponding eigenvalue does not lie entirely
                              below 1.
                                A second bootstrap approach suggested by Jackson (1993) finds 95%
                              confidence intervals for both eigenvalues and eigenvector coefficients. To
   152   153   154   155   156   157   158   159   160   161   162