Page 159 - Jolliffe I. Principal Component Analysis
P. 159

6. Choosing a Subset of Principal Components or Variables
                              128
                              rally, and a number of rules for selecting a subset of PCs have been put
                              forward with this context very much in mind. The LEV diagram, discussed
                              in Section 6.1.3, is one example, as is Beltrando’s (1990) method in Sec-
                              tion 6.1.6, but there are many others. In the fairly common situation where
                              different observations correspond to different time points, Preisendorfer and
                              Mobley (1988) suggest that important PCs will be those for which there is
                              a clear pattern, rather than pure randomness, present in their behaviour
                              through time. The important PCs can then be discovered by forming a
                              time series of each PC, and testing which time series are distinguishable
                              from white noise. Many tests are available for this purpose in the time
                              series literature, and Preisendorfer and Mobley (1988, Sections 5g–5j) dis-
                              cuss the use of a number of them. This type of test is perhaps relevant
                              in cases where the set of multivariate observations form a time series (see
                              Chapter 12), as in many atmospheric science applications, but in the more
                              usual (non-meteorological) situation where the observations are indepen-
                              dent, such techniques are irrelevant, as the values of the PCs for different
                              observations will also be independent. There is therefore no natural order-
                              ing of the observations, and if they are placed in a sequence, they should
                              necessarily look like a white noise series.
                                Chapter 5 of Preisendorfer and Mobley (1988) gives a thorough review of
                              selection rules used in atmospheric science. In Sections 5c–5e they discuss
                              a number of rules similar in spirit to the rules of Sections 6.1.3 and 6.1.4
                              above. They are, however, derived from consideration of a physical model,
                              based on spring-coupled masses (Section 5b), where it is required to distin-
                              guish signal (the important PCs) from noise (the unimportant PCs). The
                              details of the rules are, as a consequence, somewhat different from those
                              of Sections 6.1.3 and 6.1.4. Two main ideas are described. The first, called
                              Rule A 4 by Preisendorfer and Mobley (1988), has a passing resemblance to
                              Bartlett’s test of equality of eigenvalues, which was defined and discussed
                              in Sections 3.7.3 and 6.1.4. Rule A 4 assumes that the last (p−q) population
                              eigenvalues are equal, and uses the asymptotic distribution of the average
                              of the last (p − q) sample eigenvalues to test whether the common popula-
                              tion value is equal to λ 0 . Choosing an appropriate value for λ 0 introduces
                              a second step into the procedure and is a weakness of the rule.
                                Rule N, described in Section 5d of Preisendorfer and Mobley (1988) is
                              popular in atmospheric science. It is similar to the techniques of parallel
                              analysis, discussed in Sections 6.1.3 and 6.1.5, and involves simulating a
                              large number of uncorrelated sets of data of the same size as the real data
                              set which is to be analysed, and computing the eigenvalues of each sim-
                              ulated data set. To assess the significance of the eigenvalues for the real
                              data set, the eigenvalues are compared to percentiles derived empirically
                              from the simulated data. The suggested rule keeps any components whose
                              eigenvalues lie above the 95% level in the cumulative distribution of the
                              simulated data. A disadvantage is that if the first eigenvalue for the data
                              is very large, it makes it difficult for later eigenvalues to exceed their own
   154   155   156   157   158   159   160   161   162   163   164