Page 42 - Jolliffe I. Principal Component Analysis
P. 42

2.1. Optimal Algebraic Properties of Population Principal Components
                                Most of the properties described in this chapter have sample counter-
                              parts. Some have greater relevance in the sample context, but it is more
                              convenient to introduce them here, rather than in Chapter 3.  11
                              2.1 Optimal Algebraic Properties of Population
                                    Principal Components and Their Statistical
                                    Implications


                              Consider again the derivation of PCs given in Chapter 1, and denote by
                              z the vector whose kth element is z k ,the kth PC, k =1, 2,... ,p.(Unless
                              stated otherwise, the kth PC will be taken to mean the PC with the kth
                              largest variance, with corresponding interpretations for the ‘kth eigenvalue’
                              and ‘kth eigenvector.’) Then

                                                          z = A x,                       (2.1.1)
                              where A is the orthogonal matrix whose kth column, α k ,isthe kth
                              eigenvector of Σ. Thus, the PCs are defined by an orthonormal linear
                              transformation of x. Furthermore, we have directly from the derivation
                              in Chapter 1 that

                                                         ΣA = AΛ,                        (2.1.2)
                              where Λ is the diagonal matrix whose kth diagonal element is λ k ,the kth
                              eigenvalue of Σ,and λ k =var(α x)=var(z k ). Two alternative ways of

                                                           k
                              expressing (2.1.2) that follow because A is orthogonal will be useful later,
                              namely

                                                         A ΣA = Λ                        (2.1.3)
                              and
                                                         Σ = AΛA .                       (2.1.4)

                              The orthonormal linear transformation of x, (2.1.1), which defines z,has a
                              number of optimal properties, which are now discussed in turn.
                              Property A1.    For any integer q, 1 ≤ q ≤ p, consider the orthonormal
                              linear transformation

                                                          y = B x,                       (2.1.5)
                              where y is a q-element vector and B is a (q×p) matrix, and let Σ y = B ΣB


                              be the variance-covariance matrix for y. Then the trace of Σ y , denoted
                              tr (Σ y ), is maximized by taking B = A q ,where A q consists of the first q
                              columns of A.
   37   38   39   40   41   42   43   44   45   46   47