Page 412 - Jolliffe I. Principal Component Analysis
P. 412

377
                                        14.1. Additive Principal Components and Principal Curves
                              in some way to make the optimization problem tractable. One choice is to
                              use step functions, which leads back towards Gifi’s (1990) system of non-
                              linear PCA. Besse and Ferraty (1995) favour an approach based on splines.
                              They contrast their proposal, in which flexibility of the functional trans-
                              formation is controlled by the choice of smoothing parameters, with earlier
                              spline-based procedures controlled by the number and positioning of knots
                              (see, for example, van Rijckevorsel (1988) and Winsberg (1988)). Using
                              splines as Besse and Ferraty do is equivalent to adding a roughness penalty
                              function to the quantity to be minimized. This is similar to Besse et al.’s
                              (1997) approach to analysing functional data described in Section 12.3.4
                              using equation (12.3.6).
                                As with Gifi’s (1990) non-linear PCA, Besse and Ferraty’s (1995) pro-
                              posal is implemented by means of an alternating least squares algorithm
                              and, as in Besse and de Falgerolles (1993) for the linear case (see Sec-
                              tion 6.1.5), bootstrapping of residuals from a q-dimensional model is used
                              to decide on the best fit. Here, instead of simply using the bootstrap to
                              choose q, simultaneous optimization with respect q and with respect to the
                              smoothing parameters which determine the function f(x) is needed. At this
                              stage it might be asked ‘where is the PCA in all this?’ The name ‘PCA’
                              is still appropriate because the q-dimensional subspace is determined by
                              an optimal set of q linear functions of the vector of transformed random
                              variables f(x), and it is these linear functions that are the non-linear PCs.



                              14.1.2 Additive Principal Components and Principal Curves

                              Fowlkes and Kettenring (1985) note that one possible objective for trans-
                              forming data before performing a PCA is to find near-singularities in the
                              transformed data. In other words, x =(x 1 ,x 2 ,...,x p ) is transformed to


                              f (x)=(f 1 (x 1 ),f 2 (x 2 ),...,f p (x p )), and we are interested in finding linear

                              functions a f(x)of f(x) for which var[a f(x)] ≈ 0. Fowlkes and Kettenring

                              (1985) suggest looking for a transformation that minimizes the determi-
                              nant of the correlation matrix of the transformed variables. The last few
                              PCs derived from this correlation matrix should then identify the required
                              near-constant relationships, if any exist.
                                A similar idea underlies additive principal components, which are dis-
                              cussed in detail by Donnell et al. (1994). The additive principal components

                              take the form  p  φ j (x j ) instead of  p  a j x j in standard PCA, and, as
                                            j=1                  j=1
                              with Fowlkes and Kettenring (1985), interest centres on components for

                              which var[  p j=1  φ j (x j )] is small. To define a non-linear analogue of PCA
                              there is a choice of either an algebraic definition that minimizes variance,
                              or a geometric definition that optimizes expected squared distance from

                              the additive manifold  p j=1  φ j (x j ) = const. Once we move away from lin-
                              ear PCA, the two definitions lead to different solutions, and Donnell et
                              al. (1994) choose to minimize variance. The optimization problem to be
   407   408   409   410   411   412   413   414   415   416   417