Page 65 - Jolliffe I. Principal Component Analysis
P. 65

3. Properties of Sample Principal Components
                              34
                                Property G2 may also be carried over from populations to samples as
                              follows. Suppose that the observations x 1 , x 2 ,... x n are transformed by
                                                 y i = B x i ,  i =1, 2,...,n,

                              where B is a (p × q) matrix with orthonormal columns, so that
                              y 1 , y 2 ,..., y n , are projections of x 1 , x 2 ,..., x n onto a q-dimensional
                              subspace. Then
                                                    n  n


                                                         (y h − y i ) (y h − y i )
                                                   h=1 i=1
                              is maximized when B = A q . Conversely, the same criterion is minimized
                              when B = A .
                                         ∗
                                         q
                                This property means that if the n observations are projected onto a
                              q-dimensional subspace, then the sum of squared Euclidean distances be-
                              tween all pairs of observations in the subspace is maximized when the
                              subspace is defined by the first q PCs, and minimized when it is defined
                              by the last q PCs. The proof that this property holds is again rather sim-
                              ilar to that for the corresponding population property and will not be
                              repeated.
                                The next property to be considered is equivalent to Property A5.
                              Both are concerned, one algebraically and one geometrically, with least
                              squares linear regression of each variable x j on the q variables contained
                              in y.
                              Property G3.    As before, suppose that the observations x 1 , x 2 ,..., x n
                              are transformed by y i = B x i ,i =1, 2,... ,n,where B is a (p × q) ma-

                              trix with orthonormal columns, so that y 1 , y 2 ,..., y n are projections of
                              x 1 , x 2 ,..., x n onto a q-dimensional subspace. A measure of ‘goodness-of-
                              fit’ of this q-dimensional subspace to x 1 , x 2 ,..., x n can be defined as the
                              sum of squared perpendicular distances of x 1 , x 2 ,..., x n from the subspace.
                              This measure is minimized when B = A q .

                              Proof. The vector y i is an orthogonal projection of x i onto a q-
                              dimensional subspace defined by the matrix B. Let m i denote the position
                              of y i in terms of the original coordinates, and r i = x i − m i . (See Fig-
                              ure 3.1 for the special case where p =2, q = 1; in this case y i is a scalar,
                              whose value is the length of m i .) Because m i is an orthogonal projection
                              of x i onto a q-dimensional subspace, r i is orthogonal to the subspace, so

                              r m i = 0. Furthermore, r r i is the squared perpendicular distance of x i
                               i                     i
                              from the subspace so that the sum of squared perpendicular distances of
                              x 1 , x 2 ,..., x n from the subspace is
                                                           n


                                                              r r i .
                                                               i
                                                           i=1
   60   61   62   63   64   65   66   67   68   69   70