Page 164 - Jolliffe I. Principal Component Analysis
P. 164

6.2. Choosing m, the Number of Components: Examples
                              Table 6.1. First six eigenvalues for the correlation matrix, blood chemistry data.
                                                     1
                                 Component number
                                                                                 0.62
                                                                                        0.49
                                                                          0.78
                                                     2.79   2      3      4      5      6   133
                                                            1.53
                                                                   1.25
                                 Eigenvalue, l k

                                 t m = 100  k=1 k /p  34.9  54.1  69.7   79.4   87.2   93.3
                                              l
                                           m
                                                            1.26   0.28   0.47   0.16   0.13
                                 l k−1 − l k
                              to retain. In reading the concluding paragraph that follows, this message
                              should be kept firmly in mind.
                                Some procedures, such as those introduced in Sections 6.1.4 and 6.1.6,
                              are usually inappropriate because they retain, respectively, too many or too
                              few PCs in most circumstances. Some rules have been derived in particular
                              fields of application, such as atmospheric science (Sections 6.1.3, 6.1.7) or
                              psychology (Sections 6.1.3, 6.1.6) and may be less relevant outside these
                              fields than within them. The simple rules of Sections 6.1.1 and 6.1.2 seem
                              to work well in many examples, although the recommended cut-offs must
                              be treated flexibly. Ideally the threshold should not fall between two PCs
                              with very similar variances, and it may also change depending on the values
                              on the values of n and p, and on the presence of variables with dominant
                              variances (see the examples in the next section). A large amount of research
                              has been done on rules for choosing m since the first edition of this book
                              appeared. However it still remains true that attempts to construct rules
                              having more sound statistical foundations seem, at present, to offer little
                              advantage over the simpler rules in most circumstances.

                              6.2 Choosing m, the Number of Components:
                                    Examples

                              Two examples are given here to illustrate several of the techniques described
                              in Section 6.1; in addition, the examples of Section 6.4 include some relevant
                              discussion, and Section 6.1.8 noted a number of comparative studies.


                              6.2.1 Clinical Trials Blood Chemistry
                              These data were introduced in Section 3.3 and consist of measurements
                              of eight blood chemistry variables on 72 patients. The eigenvalues for the
                              correlation matrix are given in Table 6.1, together with the related infor-
                              mation that is required to implement the ad hoc methods described in
                              Sections 6.1.1–6.1.3.
                                Looking at Table 6.1 and Figure 6.1, the three methods of Sections 6.1.1–
                              6.1.3 suggest that between three and six PCs should be retained, but the
                              decision on a single best number is not clear-cut. Four PCs account for
   159   160   161   162   163   164   165   166   167   168   169