Page 412 - Jolliffe I. Principal Component Analysis

P. 412

377
14.1. Additive Principal Components and Principal Curves
in some way to make the optimization problem tractable. One choice is to
use step functions, which leads back towards Gifi’s (1990) system of non-
linear PCA. Besse and Ferraty (1995) favour an approach based on splines.
They contrast their proposal, in which flexibility of the functional trans-
formation is controlled by the choice of smoothing parameters, with earlier
spline-based procedures controlled by the number and positioning of knots
(see, for example, van Rijckevorsel (1988) and Winsberg (1988)). Using
splines as Besse and Ferraty do is equivalent to adding a roughness penalty
function to the quantity to be minimized. This is similar to Besse et al.’s
(1997) approach to analysing functional data described in Section 12.3.4
using equation (12.3.6).
As with Gifi’s (1990) non-linear PCA, Besse and Ferraty’s (1995) pro-
posal is implemented by means of an alternating least squares algorithm
and, as in Besse and de Falgerolles (1993) for the linear case (see Sec-
tion 6.1.5), bootstrapping of residuals from a q-dimensional model is used
to decide on the best fit. Here, instead of simply using the bootstrap to
choose q, simultaneous optimization with respect q and with respect to the
smoothing parameters which determine the function f(x) is needed. At this
stage it might be asked ‘where is the PCA in all this?’ The name ‘PCA’
is still appropriate because the q-dimensional subspace is determined by
an optimal set of q linear functions of the vector of transformed random
variables f(x), and it is these linear functions that are the non-linear PCs.

14.1.2 Additive Principal Components and Principal Curves

Fowlkes and Kettenring (1985) note that one possible objective for trans-
forming data before performing a PCA is to ﬁnd near-singularities in the
transformed data. In other words, x =(x 1 ,x 2 ,...,x p ) is transformed to

f (x)=(f 1 (x 1 ),f 2 (x 2 ),...,f p (x p )), and we are interested in ﬁnding linear

functions a f(x)of f(x) for which var[a f(x)] ≈ 0. Fowlkes and Kettenring

(1985) suggest looking for a transformation that minimizes the determi-
nant of the correlation matrix of the transformed variables. The last few
PCs derived from this correlation matrix should then identify the required
near-constant relationships, if any exist.
A similar idea underlies additive principal components, which are dis-
cussed in detail by Donnell et al. (1994). The additive principal components

take the form p φ j (x j ) instead of p a j x j in standard PCA, and, as
j=1 j=1
with Fowlkes and Kettenring (1985), interest centres on components for

which var[ p j=1 φ j (x j )] is small. To define a non-linear analogue of PCA
there is a choice of either an algebraic definition that minimizes variance,
or a geometric definition that optimizes expected squared distance from

the additive manifold p j=1 φ j (x j ) = const. Once we move away from lin-
ear PCA, the two deﬁnitions lead to diﬀerent solutions, and Donnell et
al. (1994) choose to minimize variance. The optimization problem to be

407 408 409 410 411 412 413 414 415 416 417