Page 36 - Jolliffe I. Principal Component Analysis
P. 36

1.1. Definition and Derivation of Principal Components
                              much of Chapters 2 and 3 and concentrate their attention on later chapters,
                              although Sections 2.3, 2.4, 3.3, 3.4, 3.8, and to a lesser extent 3.5, are likely
                              to be of interest to most readers.                             5
                                To derive the form of the PCs, consider first α x; the vector α 1 max-

                                                                          1
                              imizes var[α x]= α Σα 1 . It is clear that, as it stands, the maximum


                                         1       1
                              will not be achieved for finite α 1 so a normalization constraint must be

                              imposed. The constraint used in the derivation is α α 1 = 1,thatis, the
                                                                            1
                              sum of squares of elements of α 1 equals 1. Other constraints, for example
                              Max j |α 1j | = 1, may more useful in other circumstances, and can easily be
                              substituted later on. However, the use of constraints other than α α 1 =

                                                                                         1
                              constant in the derivation leads to a more difficult optimization problem,
                              and it will produce a set of derived variables different from the PCs.


                                To maximize α Σα 1 subject to α α 1 = 1, the standard approach is to
                                                              1
                                             1
                              use the technique of Lagrange multipliers. Maximize


                                                    α Σα 1 − λ(α α 1 − 1),
                                                                1
                                                     1
                              where λ is a Lagrange multiplier. Differentiation with respect to α 1 gives
                                                       Σα 1 − λα 1 = 0,
                              or
                                                      (Σ − λI p )α 1 = 0,
                              where I p is the (p × p) identity matrix. Thus, λ is an eigenvalue of Σ and
                              α 1 is the corresponding eigenvector. To decide which of the p eigenvectors
                              gives α x with maximum variance, note that the quantity to be maximized

                                    1
                              is

                                                α Σα 1 = α λα 1 = λα α 1 = λ,


                                                                    1
                                                 1
                                                           1
                              so λ must be as large as possible. Thus, α 1 is the eigenvector corresponding
                              to the largest eigenvalue of Σ,andvar(α x)= α Σα 1 = λ 1 , the largest


                                                                  1
                                                                          1
                              eigenvalue.


                                In general, the kth PC of x is α x and var(α x)= λ k , where λ k is
                                                              k           k
                              the kth largest eigenvalue of Σ,and α k is the corresponding eigenvector.
                              This will now be proved for k = 2; the proof for k ≥ 3 is slightly more
                              complicated, but very similar.
                                The second PC, α x, maximizes α Σα 2 subject to being uncorrelated


                                                2              2
                              with α x, or equivalently subject to cov[α x, α x] = 0, where cov(x, y)



                                    1                               1   2
                              denotes the covariance between the random variables x and y . But








                                cov [α x, α x]= α Σα 2 = α Σα 1 = α λ 1 α = λ 1 α α 1 = λ 1 α α 2 .
                                                                               2
                                                                                        1
                                                                        1
                                                          2
                                          2
                                                 1
                                      1
                                                                   2
                              Thus, any of the equations

                                                  α Σα 2 =0,   α Σα 1 =0,

                                                                 2
                                                    1


                                                   α α 2 =0,     α α 1 =0
                                                                   2
                                                     1
   31   32   33   34   35   36   37   38   39   40   41