Page 145 - Jolliffe I. Principal Component Analysis
P. 145

6. Choosing a Subset of Principal Components or Variables
                              114
                              ance. Mandel’s results are based on simulation studies, and although exact
                              results have been produced by some authors, they are only for limited spe-
                              cial cases. For example, Krzanowski (1979a) gives exact results for m =1
                              and p = 3 or 4, again under the assumptions of normality, independence
                              and equal variances for all variables. These assumptions mean that the
                              results can be used to determine whether or not all variables are indepen-
                              dent, but are of little general use in determining an ‘optimal’ cut-off for
                              t m . Sugiyama and Tong (1976) describe an approximate distribution for t m
                              which does not assume independence or equal variances, and which can be
                              used to test whether l 1 ,l 2 ,...,l m are compatible with any given structure
                              for λ 1 ,λ 2 ,...,λ m , the corresponding population variances. However, the
                              test still assumes normality and it is only approximate, so it is not clear
                              how useful it is in practice for choosing an appropriate value of m.
                                Huang and Tseng (1992) describe a ‘decision procedure for determining
                              the number of components’ based on t m . Given a proportion of population
                              variance τ, which one wishes to retain, and the true minimum number of
                              population PCs m τ that achieves this, Huang and Tseng (1992) develop
                                                                                 ∗
                              a procedure for finding a sample size n and a threshold t having a pre-
                              scribed high probability of choosing m = m τ . It is difficult to envisage
                              circumstances where this would be of practical value.

                                                                        l
                                A number of other criteria based on  k=m+1 k are discussed briefly by
                                                                  p
                              Jackson (1991, Section 2.8.11). In situations where some desired residual
                              variation can be specified, as sometimes happens for example in quality
                              control (see Section 13.7), Jackson (1991, Section 2.8.5) advocates choosing
                              m such that the absolute, rather than percentage, value of    k=m+1 k first
                                                                                         l
                                                                                   p
                              falls below the chosen threshold.
                              6.1.2 Size of Variances of Principal Components
                              The previous rule is equally valid whether a covariance or a correlation
                              matrix is used to compute the PCs. The rule described in this section is
                              constructed specifically for use with correlation matrices, although it can
                              be adapted for some types of covariance matrices. The idea behind the
                              rule is that if all elements of x are independent, then the PCs are the
                              same as the original variables and all have unit variances in the case of
                              a correlation matrix. Thus any PC with variance less than 1 contains less
                              information than one of the original variables and so is not worth retaining.
                              The rule, in its simplest form, is sometimes called Kaiser’s rule (Kaiser,
                              1960) and retains only those PCs whose variances l k exceed 1. If the data
                              set contains groups of variables having large within-group correlations, but
                              small between group correlations, then there is one PC associated with each
                              group whose variance is > 1, whereas any other PCs associated with the
                              group have variances < 1 (see Section 3.8). Thus, the rule will generally
                              retain one, and only one, PC associated with each such group of variables,
                              which seems to be a reasonable course of action for data of this type.
   140   141   142   143   144   145   146   147   148   149   150