Page 416 - Jolliffe I. Principal Component Analysis
P. 416

381
                                        14.1. Additive Principal Components and Principal Curves
                                Since Kramer’s paper appeared, a number of authors in the neural net-
                              work literature have noted limitations to his procedure and have suggested
                              alternatives or modifications (see, for example, Jia et al., 2000), although
                              it is now used in a range of disciplines including climatology (Monahan,
                              2001). Dong and McAvoy (1996) propose an algorithm that combines the
                              principal curves of Section 14.1.2 (Hastie and Stuetzle, 1989) with the auto-
                              associative neural network set-up of Kramer (1991). Principal curves alone
                              do not allow the calculation of ‘scores’ with respect to the curves for new
                              observations, but their combination with a neural network enables such
                              quantities to be computed.
                                An alternative approach, based on a so-called input-training net, is sug-
                              gested by Tan and Mavrovouniotis (1995). In such networks, the inputs are
                              not fixed, but are trained along with the other parameters of the network.
                              With a single input the results of the algorithm are equivalent to principal
                              curves, but with a larger number of inputs there is increased flexibility to
                              go beyond the additive model underlying principal curves.
                                Jia et al. (2000) use Tan and Mavrovouniotis’s (1995) input-training
                              net, but have an ordinary linear PCA as a preliminary step. The non-
                              linear algorithm is then conducted on the first m linear PCs, where m
                              is chosen to be sufficiently large, ensuring that only PCs with very small
                              variances are excluded. Jia and coworkers suggest that around 97% of the
                              total variance should be retained to avoid discarding dimensions that might
                              include important non-linear variation. The non-linear components are used
                              in process control (see Section 13.7), and in an example they give improved
                              fault detection compared to linear PCs (Jia et al., 2000). The preliminary
                              step reduces the dimensionality of the data from 37 variables to 12 linear
                              PCs, whilst retaining 98% of the variation.
                                Kambhatla and Leen (1997) introduce non-linearity in a different way, us-
                              ing a piecewise-linear or ‘local’ approach. The p-dimensional space defined
                              by the possible values of x is partitioned into Q regions, and linear PCs
                              are then found separately for each region. Kambhatla and Leen (1997) note
                              that this local PCA provides a faster algorithm than a global non-linear
                              neural network. A clustering algorithm is used to define the Q regions.
                              Roweis and Saul (2000) describe a locally linear embedding algorithm that
                              also generates local linear reconstructions of observations, this time based
                              on a set of ‘neighbours’ of each observation. Tarpey (2000) implements a
                              similar but more restricted idea. He looks separately at the first PCs within
                              two regions defined by the sign of the first PC for the whole data set, as a
                              means of determining the presence of non-linear structure in the data.

                              14.1.4 Other Aspects of Non-Linearity

                              We saw in Section 5.3 that biplots can provide an informative way of dis-
                              playing the results of a PCA. Modifications of these ‘classical’ biplots to
                              become non-linear are discussed in detail by Gower and Hand (1996, Chap-
   411   412   413   414   415   416   417   418   419   420   421