Page 416 - Jolliffe I. Principal Component Analysis

P. 416

381
14.1. Additive Principal Components and Principal Curves
Since Kramer’s paper appeared, a number of authors in the neural net-
work literature have noted limitations to his procedure and have suggested
alternatives or modifications (see, for example, Jia et al., 2000), although
it is now used in a range of disciplines including climatology (Monahan,
2001). Dong and McAvoy (1996) propose an algorithm that combines the
principal curves of Section 14.1.2 (Hastie and Stuetzle, 1989) with the auto-
associative neural network set-up of Kramer (1991). Principal curves alone
do not allow the calculation of ‘scores’ with respect to the curves for new
observations, but their combination with a neural network enables such
quantities to be computed.
An alternative approach, based on a so-called input-training net, is sug-
gested by Tan and Mavrovouniotis (1995). In such networks, the inputs are
not fixed, but are trained along with the other parameters of the network.
With a single input the results of the algorithm are equivalent to principal
curves, but with a larger number of inputs there is increased flexibility to
go beyond the additive model underlying principal curves.
Jia et al. (2000) use Tan and Mavrovouniotis’s (1995) input-training
net, but have an ordinary linear PCA as a preliminary step. The non-
linear algorithm is then conducted on the first m linear PCs, where m
is chosen to be sufficiently large, ensuring that only PCs with very small
variances are excluded. Jia and coworkers suggest that around 97% of the
total variance should be retained to avoid discarding dimensions that might
include important non-linear variation. The non-linear components are used
in process control (see Section 13.7), and in an example they give improved
fault detection compared to linear PCs (Jia et al., 2000). The preliminary
step reduces the dimensionality of the data from 37 variables to 12 linear
PCs, whilst retaining 98% of the variation.
Kambhatla and Leen (1997) introduce non-linearity in a different way, us-
ing a piecewise-linear or ‘local’ approach. The p-dimensional space defined
by the possible values of x is partitioned into Q regions, and linear PCs
are then found separately for each region. Kambhatla and Leen (1997) note
that this local PCA provides a faster algorithm than a global non-linear
neural network. A clustering algorithm is used to define the Q regions.
Roweis and Saul (2000) describe a locally linear embedding algorithm that
also generates local linear reconstructions of observations, this time based
on a set of ‘neighbours’ of each observation. Tarpey (2000) implements a
similar but more restricted idea. He looks separately at the first PCs within
two regions defined by the sign of the first PC for the whole data set, as a
means of determining the presence of non-linear structure in the data.

14.1.4 Other Aspects of Non-Linearity

We saw in Section 5.3 that biplots can provide an informative way of dis-
playing the results of a PCA. Modiﬁcations of these ‘classical’ biplots to
become non-linear are discussed in detail by Gower and Hand (1996, Chap-

411 412 413 414 415 416 417 418 419 420 421