Page 430 - Jolliffe I. Principal Component Analysis
P. 430

395
                                                      14.4. PCA for Non-Normal Distributions
                              variables, but linear functions of them. Jensen (1997) shows that when
                              ‘scatter’ is defined as variance and x has a multivariate normal distribution,
                              then his principal variables turn out to be the principal components. This
                              result is discussed following the spectral decomposition of the covariance
                              matrix (Property A3) in Section 2.1. Jensen (1997) greatly extends the
                              result by showing that for a family of elliptical distributions and for a
                              wide class of definitions of scatter, his principal variables are the same as
                              principal components.
                                An idea which may be considered an extension of PCA to non-normal
                              data is described by Qian et al. (1994). They investigate linear transfor-
                              mations of the p-variable vector x to q (<p) derived variables y that
                              minimize what they call an index of predictive power. This index is based
                              on minimum description length or stochastic complexity (see, for example,
                              Rissanen and Yu (2000)) and measures the difference in stochastic com-
                              plexity between x and y. The criterion is such that the optimal choice of
                              y depends on the probability distribution of x, and Qian and coworkers
                              (1994) show that for multivariate normal x, the derived variables y are
                              the first q PCs. This can be viewed as an additional property of PCA, but
                              confusingly they take it as a definition of principal components. This leads
                              to their ‘principal components’ being different from the usual principal
                              components when the distribution of x is nonnormal. They discuss various
                              properties of their components and include a series of tests of hypotheses
                              for deciding how many components are needed to adequately represent all
                              the original variables.
                                Another possible extension of PCA to non-normal data is hinted at by
                              O’Hagan (1994, Section 2.15). For a multivariate normal distribution, the
                              covariance matrix is given by the negative of the inverse ‘curvature’ of the
                              log-probability density function, where ‘curvature’ is defined as the matrix
                              of second derivatives with respect to the elements of x. In the Bayesian set-
                              up where x is replaced by a vector of parameters θ, O’Hagan (1994) refers
                              to the curvature evaluated at the modal value of θ as the modal dispersion
                              matrix. He suggests finding eigenvectors, and hence principal axes, based
                              on this matrix, which is typically not the covariance matrix for non-normal
                              distributions.


                              14.4.1 Independent Component Analysis
                              The technique, or family of techniques, known as independent component
                              analysis (ICA) has been the subject of a large amount of research, starting
                              in the late 1980s, especially in signal processing. It has been applied to
                              various biomedical and imaging problems, and is beginning to be used in
                              other fields such as atmospheric science. By the end of the 1990s it had
                              its own annual workshops and at least one book (Lee, 1998). Although
                              it is sometimes presented as a competitor to PCA, the links are not par-
                              ticularly strong-as we see below it seems closer to factor analysis—so the
   425   426   427   428   429   430   431   432   433   434   435