Page 50 - Jolliffe I. Principal Component Analysis
P. 50

2.2. Geometric Properties of Population Principal Components
                                                                                            19
                              for most other properties of PCs no distributional assumptions are required.
                              However, the property will be discussed further in connection with Property
                              G5 in Section 3.2, where we see that it has some relevance even without
                              the assumption of multivariate normality. Property G5 looks at the sample
                              version of the ellipsoids x Σx = const. Because Σ and Σ −1  share the same

                              eigenvectors, it follows that the principal axes of the ellipsoids x Σx =const


                              are the same as those of x Σ −1 x = const, except that that their order is
                              reversed.
                                We digress slightly here to note that some authors imply, or even state
                              explicitly, as do Qian et al. (1994), that PCA needs multivariate normal-
                              ity. This text takes a very different view and considers PCA as a mainly
                              descriptive technique. It will become apparent that many of the properties
                              and applications of PCA and related techniques described in later chap-
                              ters, as well as the properties discussed in the present chapter, have no
                              need for explicit distributional assumptions. It cannot be disputed that
                              linearity and covariances/correlations, both of which play a central rˆole in
                              PCA, have especial relevance when distributions are multivariate normal,
                              but this does not detract from the usefulness of PCA when data have other
                              forms. Qian et al. (1994) describe what might be considered an additional
                              property of PCA, based on minimum description length or stochastic com-
                              plexity (Rissanen and Yu, 2000), but as they use it to define a somewhat
                              different technique, we defer discussion to Section 14.4.
                              Property G2.   Suppose that x 1 , x 2 are independent random vectors, both
                              having the same probability distribution, and that x 1 , x 2 , are both subjected
                              to the same linear transformation

                                                     y i = B x i ,  i =1, 2.
                              If B is a (p × q) matrix with orthonormal columns chosen to maximize
                              E[(y 1 − y 2 ) (y 1 − y 2 )], then B = A q , using the same notation as before.

                              Proof. This result could be viewed as a purely algebraic property, and,
                              indeed, the proof below is algebraic. The property is, however, included
                              in the present section because it has a geometric interpretation. This is
                              that the expected squared Euclidean distance, in a q-dimensional subspace,
                              between two vectors of p random variables with the same distribution, is
                              made as large as possible if the subspace is defined by the first q PCs.
                                To prove Property G2, first note that x 1 , x 2 have the same mean µ and
                              covariance matrix Σ. Hence y 1 , y 2 also have the same mean and covariance
                              matrix, B µ, B ΣB respectively.






                                E[(y 1 − y 2 ) (y 1 − y 2 )] = E{[(y 1 − B µ) − (y 2 − (B µ)] [(y 1 − B µ)


                                                      − (y 2 − B µ)]}



                                                     = E[(y 1 − B µ) (y 1 − B µ)]
                                                      + E[(y 2 − B µ) (y 2 − B µ)].
   45   46   47   48   49   50   51   52   53   54   55