Page 44 - Jolliffe I. Principal Component Analysis
P. 44

13
                               2.1. Optimal Algebraic Properties of Population Principal Components
                                        2
                                        c
                              Now

                                           is the coefficient of λ j in (2.1.6), the sum of these coefficients
                                    q
                                    k=1 jk
                              is q from (2.1.7), and none of the coefficients can exceed 1, from (2.1.8).
                                                                                      2
                              Because λ 1 >λ 2 > ··· >λ p , it is fairly clear that    p  (   q  c )λ j will
                                                                            j=1   k=1 jk
                              be maximized if we can find a set of c jk for which

                                                q
                                                   2      1,  j =1,...,q,
                                                  c   =                                  (2.1.9)
                                                   jk     0,  j = q +1,...,p.
                                               k=1


                              But if B = A , then
                                          q

                                                         1,  1 ≤ j = k ≤ q,
                                                  c jk =
                                                         0,  elsewhere,

                              which satisfies (2.1.9). Thus tr(Σ y ) achieves its maximum value when B =

                              A .
                               q
                              Property A2.    Consider again the orthonormal transformation
                                                          y = B x,

                              with x, B, A and Σ y defined as before. Then tr(Σ y ) is minimized by taking
                                             ∗
                              B = A where A consists of the last q columns of A.
                                    ∗
                                    q        q
                              Proof. The derivation of PCs given in Chapter 1 can easily be turned
                              around for the purpose of looking for, successively, linear functions of x
                              whose variances are as small as possible, subject to being uncorrelated
                              with previous linear functions. The solution is again obtained by finding
                              eigenvectors of Σ, but this time in reverse order, starting with the smallest.
                              The argument that proved Property A1 can be similarly adapted to prove
                              Property A2.
                                The statistical implication of Property A2 is that the last few PCs are
                              not simply unstructured left-overs after removing the important PCs. Be-
                              cause these last PCs have variances as small as possible they are useful in
                              their own right. They can help to detect unsuspected near-constant linear
                              relationships between the elements of x (see Section 3.4), and they may
                              also be useful in regression (Chapter 8), in selecting a subset of variables
                              from x (Section 6.3), and in outlier detection (Section 10.1).
                              Property A3. (the Spectral Decomposition of Σ)


                                            Σ = λ 1 α 1 α + λ 2 α 2 α + ··· + λ p α p α .  (2.1.10)

                                                                2
                                                       1
                                                                              p
                              Proof.
                                                   Σ = AΛA     from (2.1.4),
                              and expanding the right-hand side matrix product shows that Σ equals
                                                          p

                                                            λ k α k α ,

                                                                  k
                                                         k=1
                              as required (see the derivation of (2.1.6)).
   39   40   41   42   43   44   45   46   47   48   49