Page 413 - Jolliffe I. Principal Component Analysis
P. 413

14. Generalizations and Adaptations of Principal Component Analysis
                              378
                              solved is then to successively find p-variate vectors φ
                                          (k)
                              elements are φ
                                             (x j ), which minimize
                                          j
                                                         & p        '      (k) ,k =1, 2,... , whose
                                                               (k)
                                                       var    φ  (x j )
                                                               j
                                                          j=1
                                                 (k)
                                         p
                              subject to  j=1  var[φ j  (x j )] = 1, and for k> 1,k > l,
                                                   p
                                                         (k)    (l)
                                                     cov[φ  (x j )φ (x j )] = 0.
                                                         j      j
                                                  j=1
                                As with linear PCA, this reduces to an eigenvalue problem. The main
                              choice to be made is the set of functions φ(.) over which optimization is to
                              take place. In an example Donnell et al. (1994) use splines, but their theo-
                              retical results are quite general and they discuss other, more sophisticated,
                              smoothers. They identify two main uses for low-variance additive principal
                              components, namely to fit additive implicit equations to data and to iden-
                              tify the presence of ‘concurvities,’ which play the same rˆole and cause the
                              same problems in additive regression as do collinearities in linear regression.
                                Principal curves are included in the same section as additive principal
                              components despite the insistence by Donnell and coworkers in a response
                              to discussion of their paper by Flury that they are very different. One dif-
                              ference is that although the range of functions allowed in additive principal
                              components is wide, an equation is found relating the variables via the
                              functions φ j (x j ), whereas a principal curve is just that, a smooth curve
                              with no necessity for a parametric equation. A second difference is that
                              additive principal components concentrate on low-variance relationships,
                              while principal curves minimize variation orthogonal to the curve.
                                There is nevertheless a similarity between the two techniques, in that
                              both replace an optimum line or plane produced by linear PCA by an
                              optimal non-linear curve or surface. In the case of principal curves, a smooth
                              one-dimensional curve is sought that passes through the ‘middle’ of the data
                              set. With an appropriate definition of ‘middle,’ the first PC gives the best
                              straight line through the middle of the data, and principal curves generalize
                              this using the idea of self-consistency, which was introduced at the end of
                              Section 2.2. We saw there that, for p-variate random vectors x, y,the
                              vector of random variables y is self-consistent for x if E[x|y]= y. Consider
                              a smooth curve in the p-dimensional space defined by x. The curve can be
                              written f(λ), where λ defines the position along the curve, and the vector
                              f(λ) contains the values of the elements of x for a given value of λ. A curve
                              f(λ) is self-consistent, that is, a principal curve,if E[x | f −1 (x)= λ]= f(λ),
                              where f −1 (x) is the value of λ for which  x−f(λ)  is minimized. What this
                              means intuitively is that, for any given value of λ,say λ 0 , the average of all
                              values of x that have f(λ 0 ) as their closest point on the curve is precisely
                              f(λ 0 ).
   408   409   410   411   412   413   414   415   416   417   418