Page 412 - Jolliffe I. Principal Component Analysis
P. 412
377
14.1. Additive Principal Components and Principal Curves
in some way to make the optimization problem tractable. One choice is to
use step functions, which leads back towards Gifi’s (1990) system of non-
linear PCA. Besse and Ferraty (1995) favour an approach based on splines.
They contrast their proposal, in which flexibility of the functional trans-
formation is controlled by the choice of smoothing parameters, with earlier
spline-based procedures controlled by the number and positioning of knots
(see, for example, van Rijckevorsel (1988) and Winsberg (1988)). Using
splines as Besse and Ferraty do is equivalent to adding a roughness penalty
function to the quantity to be minimized. This is similar to Besse et al.’s
(1997) approach to analysing functional data described in Section 12.3.4
using equation (12.3.6).
As with Gifi’s (1990) non-linear PCA, Besse and Ferraty’s (1995) pro-
posal is implemented by means of an alternating least squares algorithm
and, as in Besse and de Falgerolles (1993) for the linear case (see Sec-
tion 6.1.5), bootstrapping of residuals from a q-dimensional model is used
to decide on the best fit. Here, instead of simply using the bootstrap to
choose q, simultaneous optimization with respect q and with respect to the
smoothing parameters which determine the function f(x) is needed. At this
stage it might be asked ‘where is the PCA in all this?’ The name ‘PCA’
is still appropriate because the q-dimensional subspace is determined by
an optimal set of q linear functions of the vector of transformed random
variables f(x), and it is these linear functions that are the non-linear PCs.
14.1.2 Additive Principal Components and Principal Curves
Fowlkes and Kettenring (1985) note that one possible objective for trans-
forming data before performing a PCA is to find near-singularities in the
transformed data. In other words, x =(x 1 ,x 2 ,...,x p ) is transformed to
f (x)=(f 1 (x 1 ),f 2 (x 2 ),...,f p (x p )), and we are interested in finding linear
functions a f(x)of f(x) for which var[a f(x)] ≈ 0. Fowlkes and Kettenring
(1985) suggest looking for a transformation that minimizes the determi-
nant of the correlation matrix of the transformed variables. The last few
PCs derived from this correlation matrix should then identify the required
near-constant relationships, if any exist.
A similar idea underlies additive principal components, which are dis-
cussed in detail by Donnell et al. (1994). The additive principal components
take the form p φ j (x j ) instead of p a j x j in standard PCA, and, as
j=1 j=1
with Fowlkes and Kettenring (1985), interest centres on components for
which var[ p j=1 φ j (x j )] is small. To define a non-linear analogue of PCA
there is a choice of either an algebraic definition that minimizes variance,
or a geometric definition that optimizes expected squared distance from
the additive manifold p j=1 φ j (x j ) = const. Once we move away from lin-
ear PCA, the two definitions lead to different solutions, and Donnell et
al. (1994) choose to minimize variance. The optimization problem to be

