Page 430 - Jolliffe I. Principal Component Analysis

P. 430

395
14.4. PCA for Non-Normal Distributions
variables, but linear functions of them. Jensen (1997) shows that when
‘scatter’ is defined as variance and x has a multivariate normal distribution,
then his principal variables turn out to be the principal components. This
result is discussed following the spectral decomposition of the covariance
matrix (Property A3) in Section 2.1. Jensen (1997) greatly extends the
result by showing that for a family of elliptical distributions and for a
wide class of definitions of scatter, his principal variables are the same as
principal components.
An idea which may be considered an extension of PCA to non-normal
data is described by Qian et al. (1994). They investigate linear transfor-
mations of the p-variable vector x to q (<p) derived variables y that
minimize what they call an index of predictive power. This index is based
on minimum description length or stochastic complexity (see, for example,
Rissanen and Yu (2000)) and measures the difference in stochastic com-
plexity between x and y. The criterion is such that the optimal choice of
y depends on the probability distribution of x, and Qian and coworkers
(1994) show that for multivariate normal x, the derived variables y are
the first q PCs. This can be viewed as an additional property of PCA, but
confusingly they take it as a definition of principal components. This leads
to their ‘principal components’ being different from the usual principal
components when the distribution of x is nonnormal. They discuss various
properties of their components and include a series of tests of hypotheses
for deciding how many components are needed to adequately represent all
the original variables.
Another possible extension of PCA to non-normal data is hinted at by
O’Hagan (1994, Section 2.15). For a multivariate normal distribution, the
covariance matrix is given by the negative of the inverse ‘curvature’ of the
log-probability density function, where ‘curvature’ is defined as the matrix
of second derivatives with respect to the elements of x. In the Bayesian set-
up where x is replaced by a vector of parameters θ, O’Hagan (1994) refers
to the curvature evaluated at the modal value of θ as the modal dispersion
matrix. He suggests finding eigenvectors, and hence principal axes, based
on this matrix, which is typically not the covariance matrix for non-normal
distributions.

14.4.1 Independent Component Analysis
The technique, or family of techniques, known as independent component
analysis (ICA) has been the subject of a large amount of research, starting
in the late 1980s, especially in signal processing. It has been applied to
various biomedical and imaging problems, and is beginning to be used in
other ﬁelds such as atmospheric science. By the end of the 1990s it had
its own annual workshops and at least one book (Lee, 1998). Although
it is sometimes presented as a competitor to PCA, the links are not par-
ticularly strong-as we see below it seems closer to factor analysis—so the

425 426 427 428 429 430 431 432 433 434 435