Page 51 - Jolliffe I. Principal Component Analysis
P. 51

2. Properties of Population Principal Components
                              20
                              The cross-product terms disappear because of the independence of x 1 , x 2 ,
                              and hence of y 1 , y 2 .
                                Now, for i =1, 2, we have

                                     E[(y i − B µ) (y i − B µ)] = E{tr[(y i − B µ) (y i − B µ)]}

                                                           = E{tr[(y i − B µ)(y i − B µ) ]}

                                                           =tr{E[(y i − B µ)(y i − B µ) ]}
                                                           =tr(B ΣB).


                              But tr(B ΣB) is maximized when B = A q , from Property A1, and the

                              present criterion has been shown above to be 2 tr(B ΣB). Hence Property
                              G2 is proved.
                                There is a closely related property whose geometric interpretation is more
                              tenuous, namely that with the same definitions as in Property G2,


                                                  det{E[(y 1 − y 2 )(y 1 − y 2 ) ]}

                              is maximized when B = A q (see McCabe (1984)). This property says that
                              B = A q makes the generalized variance of (y 1 − y 2 ) as large as possible.
                              Generalized variance may be viewed as an alternative measure of distance
                              apart of y 1 and y 2 in q-dimensional space, though a less intuitively obvious
                              measure than expected squared Euclidean distance.

                                Finally, Property G2 can be reversed in the sense that if E[(y 1 −y 2 ) (y 1 −
                              y 2 )] or det{E[(y 1 − y 2 )(y 1 − y 2 ) ]} is to be minimized, then this can be

                                                     ∗
                              achieved by taking B = A .
                                                     q
                                The properties given in this section and in the previous one show that
                              covariance matrix PCs satisfy several different optimality criteria, but the
                              list of criteria covered is by no means exhaustive; for example, Devijver
                              and Kittler (1982, Chapter 9) show that the first few PCs minimize rep-
                              resentation entropy and the last few PCs minimize population entropy.
                              Diamantaras and Kung (1996, Section 3.4) discuss PCA in terms of max-
                              imizing mutual information between x and y. Further optimality criteria
                              are given by Hudlet and Johnson (1982), McCabe (1984) and Okamoto
                              (1969). The geometry of PCs is discussed at length by Treasure (1986).
                                The property of self-consistency is useful in a non-linear extension of
                              PCA (see Section 14.1.2). For two p-variate random vectors x, y, the vector
                              y is self-consistent for x if E(x|y)= y. Flury (1997, Section 8.4) shows that
                              if x is a p-variate random vector with a multivariate normal or elliptical
                              distribution, and y is the orthogonal projection of x onto the q-dimensional
                              subspace spanned by the first q PCs for x, then y is self-consistent for x.
                              Tarpey (1999) uses self-consistency of principal components after linear
                              transformation of the variables to characterize elliptical distributions.
   46   47   48   49   50   51   52   53   54   55   56