Page 58 - Jolliffe I. Principal Component Analysis
P. 58

27
                                       2.4. Principal Components with Equal and/or Zero Variances
                              2.4 Principal Components with Equal and/or Zero
                                    Variances
                              The final, short, section of this chapter discusses two problems that may
                              arise in theory, but are relatively uncommon in practice. In most of this
                              chapter it has been assumed, implicitly or explicitly, that the eigenvalues
                              of the covariance or correlation matrix are all different, and that none of
                              them is zero.
                                Equality of eigenvalues, and hence equality of variances of PCs, will occur
                              for certain patterned matrices. The effect of this occurrence is that for a
                              group of q equal eigenvalues, the corresponding q eigenvectors span a certain
                              unique q-dimensional space, but, within this space, they are, apart from
                              being orthogonal to one another, arbitrary. Geometrically (see Property
                              G1), what happens for q = 2 or 3 is that the principal axes of a circle or
                              sphere cannot be uniquely defined; a similar problem arises for hyperspheres
                              when q> 3. Thus individual PCs corresponding to eigenvalues in a group of
                              equal eigenvalues are not uniquely defined. A further problem with equal-
                              variance PCs is that statistical inference becomes more complicated (see
                              Section 3.7).
                                The other complication, variances equal to zero, occurs rather more fre-
                              quently, but is still fairly unusual. If q eigenvalues are zero, then the rank
                              of Σ is (p − q) rather than p, and this outcome necessitates modifications
                              to the proofs of some properties given in Section 2.1 above. Any PC with
                              zero variance defines an exactly constant linear relationship between the
                              elements of x. If such relationships exist, then they imply that one variable
                              is redundant for each relationship, as its value can be determined exactly
                              from the values of the other variables appearing in the relationship. We
                              could therefore reduce the number of variables from p to (p − q) without
                              losing any information. Ideally, exact linear relationships should be spotted
                              before doing a PCA, and the number of variables reduced accordingly. Al-
                              ternatively, any exact or near-exact linear relationships uncovered by the
                              last few PCs can be used to select a subset of variables that contain most
                              of the information available in all of the original variables. This and related
                              ideas are more relevant to samples than to populations and are discussed
                              further in Sections 3.4 and 6.3.
                                There will always be the same number of zero eigenvalues for a cor-
                              relation matrix as for the corresponding covariance matrix, since an exact
                              linear relationship between the elements of x clearly implies an exact linear
                              relationship between the standardized variables, and vice versa. There is
                              not the same equivalence, however, when it comes to considering equal vari-
                              ance PCs. Equality of some of the eigenvalues in a covariance (correlation)
                              matrix need not imply that any of the eigenvalues of the corresponding
                              correlation (covariance) matrix are equal. A simple example is when the p
                              variables all have equal correlations but unequal variances. If p> 2, then
   53   54   55   56   57   58   59   60   61   62   63