Page 413 - Jolliffe I. Principal Component Analysis

P. 413

14. Generalizations and Adaptations of Principal Component Analysis
378
solved is then to successively find p-variate vectors φ
(k)
elements are φ
(x j ), which minimize
j
& p ' (k) ,k =1, 2,... , whose
(k)
var φ (x j )
j
j=1
(k)
p
subject to j=1 var[φ j (x j )] = 1, and for k> 1,k > l,
p
(k) (l)
cov[φ (x j )φ (x j )] = 0.
j j
j=1
As with linear PCA, this reduces to an eigenvalue problem. The main
choice to be made is the set of functions φ(.) over which optimization is to
take place. In an example Donnell et al. (1994) use splines, but their theo-
retical results are quite general and they discuss other, more sophisticated,
smoothers. They identify two main uses for low-variance additive principal
components, namely to fit additive implicit equations to data and to iden-
tify the presence of ‘concurvities,’ which play the same rôle and cause the
same problems in additive regression as do collinearities in linear regression.
Principal curves are included in the same section as additive principal
components despite the insistence by Donnell and coworkers in a response
to discussion of their paper by Flury that they are very different. One dif-
ference is that although the range of functions allowed in additive principal
components is wide, an equation is found relating the variables via the
functions φ j (x j ), whereas a principal curve is just that, a smooth curve
with no necessity for a parametric equation. A second difference is that
additive principal components concentrate on low-variance relationships,
while principal curves minimize variation orthogonal to the curve.
There is nevertheless a similarity between the two techniques, in that
both replace an optimum line or plane produced by linear PCA by an
optimal non-linear curve or surface. In the case of principal curves, a smooth
one-dimensional curve is sought that passes through the ‘middle’ of the data
set. With an appropriate definition of ‘middle,’ the first PC gives the best
straight line through the middle of the data, and principal curves generalize
this using the idea of self-consistency, which was introduced at the end of
Section 2.2. We saw there that, for p-variate random vectors x, y,the
vector of random variables y is self-consistent for x if E[x|y]= y. Consider
a smooth curve in the p-dimensional space defined by x. The curve can be
written f(λ), where λ defines the position along the curve, and the vector
f(λ) contains the values of the elements of x for a given value of λ. A curve
f(λ) is self-consistent, that is, a principal curve,if E[x | f −1 (x)= λ]= f(λ),
where f −1 (x) is the value of λ for which x−f(λ) is minimized. What this
means intuitively is that, for any given value of λ,say λ 0 , the average of all
values of x that have f(λ 0 ) as their closest point on the curve is precisely
f(λ 0 ).

408 409 410 411 412 413 414 415 416 417 418