Page 414 - Jolliffe I. Principal Component Analysis
P. 414
14.1. Additive Principal Components and Principal Curves
379
It follows from the discussion in Section 2.2 that for multivariate normal
and elliptical distributions the first principal component defines a principal
curve, though there may also be other principal curves which are differ-
ent from the first PC. Hence, non-linear principal curves may be thought
of as a generalization of the first PC for other probability distributions.
The discussion so far has been in terms of probability distributions, but a
similar idea can be defined for samples. In this case a curve is fitted itera-
tively, alternating between ‘projection’ and ‘conditional-expectation’ steps.
In a projection step, the closest point on the current curve is found for each
observation in the sample, and the conditional-expectation step then calcu-
lates the average of observations closest to each point on the curve. These
averages form a new curve to be used in the next projection step. In a
finite data set there will usually be at most one observation correspond-
ing to a given point on the curve, so some sort of smoothing is required
to form the averages. Hastie and Stuetzle (1989) provide details of some
possible smoothing schemes, together with examples. They also discuss the
possibility of extension from curves to higher-dimensional surfaces.
Tarpey (1999) describes a ‘lack-of-fit’ test that can be used to decide
whether or not a principal curve is simply the first PC. The test involves
the idea of principal points which, for populations, are defined as follows.
Suppose that x is a p-variate random vector and y is a discrete p-variate
random vector, taking only the k values y 1 , y 2 ,..., y k .If y is such that
2
E[ x − y ] is minimized over all possible choices of the k values for
y, then y 1 , y 2 ,..., y k are the k principal points for the distribution of x.
There is a connection with self-consistency, as y is self-consistent for x
in this case. Flury (1993) discusses several methods for finding principal
points in a sample.
There is another link between principal points and principal components,
namely that if x has a multivariate normal or elliptical distribution, and the
principal points y 1 , y 2 ,..., y k for the distribution lie in a q (<p) subspace,
then the subspace is identical to that spanned by the vectors of coefficients
defining the first q PCs of x (Flury, 1995, Theorem 2.3). Tarpey (2000)
introduces the idea of parallel principal axes, which are parallel hyperplanes
orthogonal to the axis defined by the first PC that intersect that axis at the
principal points of the marginal distribution of x along the axis. He shows
that self-consistency of parallel principal axes characterizes multivariate
normal distributions.
14.1.3 Non-Linearity Using Neural Networks
A considerable amount of work has been done on PCA in the context of
neural networks. Indeed, there is a book on the subject (Diamantaras and
Kung, 1996) which gives a good overview. Here we describe only those
developments that provide non-linear extensions of PCA. Computational
matters are discussed in Appendix A1, and other aspects of the PCA/neural

