Page 159 - Jolliffe I. Principal Component Analysis
P. 159
6. Choosing a Subset of Principal Components or Variables
128
rally, and a number of rules for selecting a subset of PCs have been put
forward with this context very much in mind. The LEV diagram, discussed
in Section 6.1.3, is one example, as is Beltrando’s (1990) method in Sec-
tion 6.1.6, but there are many others. In the fairly common situation where
different observations correspond to different time points, Preisendorfer and
Mobley (1988) suggest that important PCs will be those for which there is
a clear pattern, rather than pure randomness, present in their behaviour
through time. The important PCs can then be discovered by forming a
time series of each PC, and testing which time series are distinguishable
from white noise. Many tests are available for this purpose in the time
series literature, and Preisendorfer and Mobley (1988, Sections 5g–5j) dis-
cuss the use of a number of them. This type of test is perhaps relevant
in cases where the set of multivariate observations form a time series (see
Chapter 12), as in many atmospheric science applications, but in the more
usual (non-meteorological) situation where the observations are indepen-
dent, such techniques are irrelevant, as the values of the PCs for different
observations will also be independent. There is therefore no natural order-
ing of the observations, and if they are placed in a sequence, they should
necessarily look like a white noise series.
Chapter 5 of Preisendorfer and Mobley (1988) gives a thorough review of
selection rules used in atmospheric science. In Sections 5c–5e they discuss
a number of rules similar in spirit to the rules of Sections 6.1.3 and 6.1.4
above. They are, however, derived from consideration of a physical model,
based on spring-coupled masses (Section 5b), where it is required to distin-
guish signal (the important PCs) from noise (the unimportant PCs). The
details of the rules are, as a consequence, somewhat different from those
of Sections 6.1.3 and 6.1.4. Two main ideas are described. The first, called
Rule A 4 by Preisendorfer and Mobley (1988), has a passing resemblance to
Bartlett’s test of equality of eigenvalues, which was defined and discussed
in Sections 3.7.3 and 6.1.4. Rule A 4 assumes that the last (p−q) population
eigenvalues are equal, and uses the asymptotic distribution of the average
of the last (p − q) sample eigenvalues to test whether the common popula-
tion value is equal to λ 0 . Choosing an appropriate value for λ 0 introduces
a second step into the procedure and is a weakness of the rule.
Rule N, described in Section 5d of Preisendorfer and Mobley (1988) is
popular in atmospheric science. It is similar to the techniques of parallel
analysis, discussed in Sections 6.1.3 and 6.1.5, and involves simulating a
large number of uncorrelated sets of data of the same size as the real data
set which is to be analysed, and computing the eigenvalues of each sim-
ulated data set. To assess the significance of the eigenvalues for the real
data set, the eigenvalues are compared to percentiles derived empirically
from the simulated data. The suggested rule keeps any components whose
eigenvalues lie above the 95% level in the cumulative distribution of the
simulated data. A disadvantage is that if the first eigenvalue for the data
is very large, it makes it difficult for later eigenvalues to exceed their own

