Page 58 - Jolliffe I. Principal Component Analysis

P. 58

27
2.4. Principal Components with Equal and/or Zero Variances
2.4 Principal Components with Equal and/or Zero
Variances
The final, short, section of this chapter discusses two problems that may
arise in theory, but are relatively uncommon in practice. In most of this
chapter it has been assumed, implicitly or explicitly, that the eigenvalues
of the covariance or correlation matrix are all different, and that none of
them is zero.
Equality of eigenvalues, and hence equality of variances of PCs, will occur
for certain patterned matrices. The effect of this occurrence is that for a
group of q equal eigenvalues, the corresponding q eigenvectors span a certain
unique q-dimensional space, but, within this space, they are, apart from
being orthogonal to one another, arbitrary. Geometrically (see Property
G1), what happens for q = 2 or 3 is that the principal axes of a circle or
sphere cannot be uniquely defined; a similar problem arises for hyperspheres
when q> 3. Thus individual PCs corresponding to eigenvalues in a group of
equal eigenvalues are not uniquely defined. A further problem with equal-
variance PCs is that statistical inference becomes more complicated (see
Section 3.7).
The other complication, variances equal to zero, occurs rather more fre-
quently, but is still fairly unusual. If q eigenvalues are zero, then the rank
of Σ is (p − q) rather than p, and this outcome necessitates modifications
to the proofs of some properties given in Section 2.1 above. Any PC with
zero variance defines an exactly constant linear relationship between the
elements of x. If such relationships exist, then they imply that one variable
is redundant for each relationship, as its value can be determined exactly
from the values of the other variables appearing in the relationship. We
could therefore reduce the number of variables from p to (p − q) without
losing any information. Ideally, exact linear relationships should be spotted
before doing a PCA, and the number of variables reduced accordingly. Al-
ternatively, any exact or near-exact linear relationships uncovered by the
last few PCs can be used to select a subset of variables that contain most
of the information available in all of the original variables. This and related
ideas are more relevant to samples than to populations and are discussed
further in Sections 3.4 and 6.3.
There will always be the same number of zero eigenvalues for a cor-
relation matrix as for the corresponding covariance matrix, since an exact
linear relationship between the elements of x clearly implies an exact linear
relationship between the standardized variables, and vice versa. There is
not the same equivalence, however, when it comes to considering equal vari-
ance PCs. Equality of some of the eigenvalues in a covariance (correlation)
matrix need not imply that any of the eigenvalues of the corresponding
correlation (covariance) matrix are equal. A simple example is when the p
variables all have equal correlations but unequal variances. If p> 2, then

53 54 55 56 57 58 59 60 61 62 63