Page 64 - Jolliffe I. Principal Component Analysis
P. 64

33
                                        3.2. Geometric Properties of Sample Principal Components
                              the last q columns of a matrix whose kth column is the kth eigenvector
                                      −1
                                                          −1
                                                             has the same eigenvectors as X X, ex-
                              of (X X)
                                        . Furthermore, (X X)



                              cept that their order is reversed, so that B q must have columns equal to
                              the first q eigenvectors of X X. As this holds for q =1, 2,... ,p, Property

                              A7 is proved.
                                This property seems to imply that replacing the predictor variables in a
                              regression analysis by their first few PCs is an attractive idea, as those PCs
                              omitted have coefficients that are estimated with little precision. The flaw in
                              this argument is that nothing in Property A7 takes account of the strength
                              of the relationship between the dependent variable y and the elements of
                              x,orbetween y and the PCs. A large variance for ˆγ k ,the kth element of
                              γ, and hence an imprecise estimate of the degree of relationship between y
                              and the kth PC, z k , does not preclude a strong relationship between y and
                              z k (see Section 8.2). Further discussion of Property A7 is given by Fomby
                              et al. (1978).
                                There are a number of other properties of PCs specific to the sample
                              situation; most have geometric interpretations and are therefore dealt with
                              in the next section.
                              3.2 Geometric Properties of Sample Principal
                                    Components


                              As with the algebraic properties, the geometric properties of Chapter 2
                              are also relevant for sample PCs, although with slight modifications to the
                              statistical implications. In addition to these properties, the present section
                              includes a proof of a sample version of Property A5, viewed geometrically,
                              and introduces two extra properties which are relevant to sample, but not
                              population, PCs.
                                Property G1 is still valid for samples if Σ is replaced by S. The ellipsoids
                                 −1
                              x S  x = const no longer have the interpretation of being contours of
                              constant probability, though they will provide estimates of such contours
                              if x 1 , x 2 ,..., x n are drawn from a multivariate normal distribution. Re-
                              introducing a non-zero mean, the ellipsoids
                                                          −1
                                                  (x − ¯ x) S  (x − ¯ x) = const
                              give contours of equal Mahalanobis distance from the sample mean ¯ x.
                              Flury and Riedwyl (1988, Section 10.6) interpret PCA as successively find-
                              ing orthogonal directions for which the Mahalanobis distance from the
                              data set to a hypersphere enclosing all the data is minimized (see Sec-
                              tions 5.3, 9.1 and 10.1 for discussion of Mahalanobis distance in a variety
                              of forms).
   59   60   61   62   63   64   65   66   67   68   69