Page 414 - Jolliffe I. Principal Component Analysis
P. 414

14.1. Additive Principal Components and Principal Curves
                                                                                            379
                                It follows from the discussion in Section 2.2 that for multivariate normal
                              and elliptical distributions the first principal component defines a principal
                              curve, though there may also be other principal curves which are differ-
                              ent from the first PC. Hence, non-linear principal curves may be thought
                              of as a generalization of the first PC for other probability distributions.
                              The discussion so far has been in terms of probability distributions, but a
                              similar idea can be defined for samples. In this case a curve is fitted itera-
                              tively, alternating between ‘projection’ and ‘conditional-expectation’ steps.
                              In a projection step, the closest point on the current curve is found for each
                              observation in the sample, and the conditional-expectation step then calcu-
                              lates the average of observations closest to each point on the curve. These
                              averages form a new curve to be used in the next projection step. In a
                              finite data set there will usually be at most one observation correspond-
                              ing to a given point on the curve, so some sort of smoothing is required
                              to form the averages. Hastie and Stuetzle (1989) provide details of some
                              possible smoothing schemes, together with examples. They also discuss the
                              possibility of extension from curves to higher-dimensional surfaces.
                                Tarpey (1999) describes a ‘lack-of-fit’ test that can be used to decide
                              whether or not a principal curve is simply the first PC. The test involves
                              the idea of principal points which, for populations, are defined as follows.
                              Suppose that x is a p-variate random vector and y is a discrete p-variate
                              random vector, taking only the k values y 1 , y 2 ,..., y k .If y is such that
                                        2
                              E[  x − y  ] is minimized over all possible choices of the k values for
                              y, then y 1 , y 2 ,..., y k are the k principal points for the distribution of x.
                              There is a connection with self-consistency, as y is self-consistent for x
                              in this case. Flury (1993) discusses several methods for finding principal
                              points in a sample.
                                There is another link between principal points and principal components,
                              namely that if x has a multivariate normal or elliptical distribution, and the
                              principal points y 1 , y 2 ,..., y k for the distribution lie in a q (<p) subspace,
                              then the subspace is identical to that spanned by the vectors of coefficients
                              defining the first q PCs of x (Flury, 1995, Theorem 2.3). Tarpey (2000)
                              introduces the idea of parallel principal axes, which are parallel hyperplanes
                              orthogonal to the axis defined by the first PC that intersect that axis at the
                              principal points of the marginal distribution of x along the axis. He shows
                              that self-consistency of parallel principal axes characterizes multivariate
                              normal distributions.


                              14.1.3 Non-Linearity Using Neural Networks
                              A considerable amount of work has been done on PCA in the context of
                              neural networks. Indeed, there is a book on the subject (Diamantaras and
                              Kung, 1996) which gives a good overview. Here we describe only those
                              developments that provide non-linear extensions of PCA. Computational
                              matters are discussed in Appendix A1, and other aspects of the PCA/neural
   409   410   411   412   413   414   415   416   417   418   419