Page 61 - Jolliffe I. Principal Component Analysis
P. 61

3. Properties of Sample Principal Components
                              30
                              The seventh section then goes on to show how these distributions may be
                              used to make statistical inferences about the population PCs, based on
                              sample PCs.
                                Section 3.8 demonstrates how the approximate structure and variances
                              of PCs can sometimes be deduced from patterns in the covariance or cor-
                              relation matrix. Finally, in Section 3.9 we discuss models that have been
                              proposed for PCA. The material could equally well have been included in
                              Chapter 2, but because the idea of maximum likelihood estimation arises
                              in some of the models we include it in the present chapter.
                              3.1 Optimal Algebraic Properties of Sample
                                    Principal Components


                              Before looking at the properties themselves, we need to establish some
                              notation. Suppose that we have n independent observations on the p-
                              element random vector x; denote these n observations by x 1 , x 2 ,..., x n .
                              Let ˜z i1 = a x i ,i =1, 2,... ,n, and choose the vector of coefficients a to


                                        1                                                  1
                              maximize the sample variance
                                                       1              2
                                                            n
                                                              (˜z i1 − ¯z 1 )
                                                      n − 1
                                                           i=1
                              subject to the normalization constraint a a 1 = 1. Next let ˜z i2 = a x i ,i =


                                                                  1                    2
                              1, 2,... ,n,and choose a to maximize the sample variance of the ˜z i2 subject

                                                  2
                              to the normalization constraint a a 2 = 1, and subject also to the ˜z i2 being

                                                           2
                              uncorrelated with the ˜z i1 in the sample. Continuing this process in an
                              obvious manner, we have a sample version of the definition of PCs given in

                              Section 1.1. Thus a x is defined as the kth sample PC, k =1, 2,... ,p,and
                                              k
                              ˜ z ik is the score for the ith observation on the kth PC. If the derivation in
                              Section 1.1 is followed through, but with sample variances and covariances
                              replacing population quantities, then it turns out that the sample variance
                              of the PC scores for the kth sample PC is l k ,the kth largest eigenvalue of the
                              sample covariance matrix S for x 1 , x 2 ,..., x n ,and a k is the corresponding
                              eigenvector for k =1, 2,...,p.
                                                         ˜
                                                               ˜
                                Define the (n × p) matrices X and Z to have (i, k)th elements equal to
                                                                                         ˜
                              the value of the kth element ˜x ik of x i , and to ˜z ik , respectively. Then Z and
                              ˜
                                                 ˜
                                             ˜
                              X are related by Z = XA, where A is the (p×p) orthogonal matrix whose
                              kth column is a k .
                                                                                          X X.
                                If the mean of each element of x is known to be zero, then S =  1 ˜   ˜
                                                                                         n
                              It is far more usual for the mean of x to be unknown, and in this case the
                              (j, k)th element of S is
                                                   1
                                                       n
                                                         (˜x ij − ¯x j )(˜x ik − ¯x k ),
                                                 n − 1
                                                       i=1
   56   57   58   59   60   61   62   63   64   65   66