Page 429 - Jolliffe I. Principal Component Analysis
P. 429
14. Generalizations and Adaptations of Principal Component Analysis
394
be used to obtain improved estimates of the coefficients B in the equation
predicting y from x. Kloek and Mennes (1960) examine a number of ways
in which PCs of w or PCs of the residuals obtained from regressing w on
x or PCs of the combined vector containing all elements of w and x,can
be used as ‘instrumental variables’ in order to obtain improved estimates
of the coefficients B.
14.4 Alternatives to Principal Component Analysis
for Non-Normal Distributions
We have noted several times that for many purposes it is not necessary to
assume any particular distribution for the variables x in a PCA, although
some of the properties of Chapters 2 and 3 rely on the assumption of
multivariate normality.
One way of handling possible non-normality, especially if the distribution
has heavy tails, is to use robust estimation of the covariance or correla-
tion matrix, or of the PCs themselves. The estimates may be designed
to allow for the presence of aberrant observations in general, or may be
based on a specific non-normal distribution with heavier tails, as in Bac-
cini et al. (1996) (see Section 10.4). In inference, confidence intervals or
tests of hypothesis may be constructed without any need for distributional
assumptions using the bootstrap or jackknife (Section 3.7.2). The paper
by Dudzi´nski et al. (1995), which was discussed in Section 10.3, investi-
gates the effect of non-normality on repeatability of PCA, albeit in a small
simulation study.
Another possibility is to assume that the vector x of random variables
has a known distribution other than the multivariate normal. A number
of authors have investigated the case of elliptical distributions, of which
the multivariate normal is a special case. For example, Waternaux (1984)
considers the usual test statistic for the null hypothesis H 0q , as defined in
Section 6.1.4, of equality of the last (p−q) eigenvalues of the covariance ma-
trix. She shows that, with an adjustment for kurtosis, the same asymptotic
distribution for the test statistic is valid for all elliptical distributions with
finite fourth moments. Jensen (1986) takes this further by demonstrating
that for a range of hypotheses relevant to PCA, tests based on a multivari-
ate normal assumption have identical level and power for all distributions
with ellipsoidal contours, even those without second moments. Things get
more complicated outside the class of elliptical distributions, as shown by
Waternaux (1984) for H 0q .
Jensen (1987) calls the linear functions of x that successively maximize
‘scatter’ of a conditional distribution, where conditioning is on previously
derived linear functions, principal variables. Unlike McCabe’s (1984) usage
of the same phrase, these ‘principal variables’ are not a subset of the original

