Page 45 - Jolliffe I. Principal Component Analysis
P. 45

2. Properties of Population Principal Components
                              14
                                This result will prove to be useful later. Looking at diagonal elements,
                              we see that
                                                               p
                                                                     2
                                                     var(x j )=  λ k α .
                                                                     kj
                                                              k=1
                              However, perhaps the main statistical implication of the result is that not
                              only can we decompose the combined variances of all the elements of x
                              into decreasing contributions due to each PC, but we can also decompose

                              the whole covariance matrix into contributions λ k α k α from each PC. Al-
                                                                             k

                              though not strictly decreasing, the elements of λ k α k α will tend to become
                                                                            k
                              smaller as k increases, as λ k decreases for increasing k, whereas the ele-
                              ments of α k tend to stay ‘about the same size’ because of the normalization
                              constraints

                                                  α α k =1,  k =1, 2,...,p.
                                                   k
                              Property Al emphasizes that the PCs explain, successively, as much as
                              possible of tr(Σ), but the current property shows, intuitively, that they
                              also do a good job of explaining the off-diagonal elements of Σ.Thisis
                              particularly true when the PCs are derived from a correlation matrix, and
                              is less valid when the covariance matrix is used and the variances of the
                              elements of x are widely different (see Section 2.3).
                                It is clear from (2.1.10) that the covariance (or correlation) matrix can
                              be constructed exactly, given the coefficients and variances of the first r
                              PCs, where r is the rank of the covariance matrix. Ten Berge and Kiers
                              (1999) discuss conditions under which the correlation matrix can be exactly
                              reconstructed from the coefficients and variances of the first q (<r)PCs.
                                A corollary of the spectral decomposition of Σ concerns the conditional
                              distribution of x, given the first q PCs, z q ,q =1, 2,... , (p − 1). It can
                              be shown that the linear combination of x that has maximum variance,
                              conditional on z q , is precisely the (q + 1)th PC. To see this, we use the
                              result that the conditional covariance matrix of x,given z q ,is

                                                      Σ − Σ xz Σ −1 Σ zx ,
                                                                zz
                              where Σ zz is the covariance matrix for z q , Σ xz is the (p × q) matrix
                              whose (j, k)th element is the covariance between x j and z k ,and Σ zx is
                              the transpose of Σ xz (Mardia et al., 1979, Theorem 3.2.4).
                                It is seen in Section 2.3 that the kth column of Σ xz is λ k α k . The matrix
                               −1                                    −1
                              Σ   is diagonal, with kth diagonal element λ  , so it follows that
                               zz
                                                                     k
                                                              q
                                                     −1               −1
                                               Σ xz Σ  Σ zx =   λ k α k λ  λ k α
                                                     zz                     k
                                                                      k
                                                             k=1
                                                              q

                                                           =    λ k α k α ,

                                                                      k
                                                             k=1
   40   41   42   43   44   45   46   47   48   49   50