Page 51 - Jolliffe I. Principal Component Analysis
P. 51
2. Properties of Population Principal Components
20
The cross-product terms disappear because of the independence of x 1 , x 2 ,
and hence of y 1 , y 2 .
Now, for i =1, 2, we have
E[(y i − B µ) (y i − B µ)] = E{tr[(y i − B µ) (y i − B µ)]}
= E{tr[(y i − B µ)(y i − B µ) ]}
=tr{E[(y i − B µ)(y i − B µ) ]}
=tr(B ΣB).
But tr(B ΣB) is maximized when B = A q , from Property A1, and the
present criterion has been shown above to be 2 tr(B ΣB). Hence Property
G2 is proved.
There is a closely related property whose geometric interpretation is more
tenuous, namely that with the same definitions as in Property G2,
det{E[(y 1 − y 2 )(y 1 − y 2 ) ]}
is maximized when B = A q (see McCabe (1984)). This property says that
B = A q makes the generalized variance of (y 1 − y 2 ) as large as possible.
Generalized variance may be viewed as an alternative measure of distance
apart of y 1 and y 2 in q-dimensional space, though a less intuitively obvious
measure than expected squared Euclidean distance.
Finally, Property G2 can be reversed in the sense that if E[(y 1 −y 2 ) (y 1 −
y 2 )] or det{E[(y 1 − y 2 )(y 1 − y 2 ) ]} is to be minimized, then this can be
∗
achieved by taking B = A .
q
The properties given in this section and in the previous one show that
covariance matrix PCs satisfy several different optimality criteria, but the
list of criteria covered is by no means exhaustive; for example, Devijver
and Kittler (1982, Chapter 9) show that the first few PCs minimize rep-
resentation entropy and the last few PCs minimize population entropy.
Diamantaras and Kung (1996, Section 3.4) discuss PCA in terms of max-
imizing mutual information between x and y. Further optimality criteria
are given by Hudlet and Johnson (1982), McCabe (1984) and Okamoto
(1969). The geometry of PCs is discussed at length by Treasure (1986).
The property of self-consistency is useful in a non-linear extension of
PCA (see Section 14.1.2). For two p-variate random vectors x, y, the vector
y is self-consistent for x if E(x|y)= y. Flury (1997, Section 8.4) shows that
if x is a p-variate random vector with a multivariate normal or elliptical
distribution, and y is the orthogonal projection of x onto the q-dimensional
subspace spanned by the first q PCs for x, then y is self-consistent for x.
Tarpey (1999) uses self-consistency of principal components after linear
transformation of the variables to characterize elliptical distributions.

