Page 37 - Jolliffe I. Principal Component Analysis
P. 37

1. Introduction
                              6
                              could be used to specify zero correlation between α x and α x. Choosing

                                                                           1
                              the last of these (an arbitrary choice), and noting that a normalization
                              constraint is again necessary, the quantity to be maximized is   2



                                                α Σα 2 − λ(α α 2 − 1) − φα α 1 ,
                                                                         2
                                                 2
                                                            2
                              where λ, φ are Lagrange multipliers. Differentiation with respect to α 2
                              gives
                                                    Σα 2 − λα 2 − φα 1 = 0
                              and multiplication of this equation on the left by α gives

                                                                           1



                                                α Σα 2 − λα α 2 − φα α 1 =0,
                                                                    1
                                                  1
                                                            1

                              which, since the first two terms are zero and α α 1 = 1, reduces to φ =0.
                                                                       1
                              Therefore, Σα 2 − λα 2 = 0, or equivalently (Σ − λI p )α 2 = 0,so λ is once
                              more an eigenvalue of Σ,and α 2 the corresponding eigenvector.

                                Again, λ = α Σα 2 ,so λ is to be as large as possible. Assuming that
                                            2
                              Σ does not have repeated eigenvalues, a complication that is discussed in
                              Section 2.4, λ cannot equal λ 1 . If it did, it follows that α 2 = α 1 , violating
                              the constraint α α 2 = 0. Hence λ is the second largest eigenvalue of Σ,

                                            1
                              and α 2 is the corresponding eigenvector.
                                As stated above, it can be shown that for the third, fourth, ..., pth
                              PCs, the vectors of coefficients α 3 , α 4 ,..., α p are the eigenvectors of Σ
                              corresponding to λ 3 ,λ 4 ,...,λ p , the third and fourth largest, ..., and the
                              smallest eigenvalue, respectively. Furthermore,

                                                              for k =1, 2,... ,p.
                                              var[α x]= λ k
                                                   k
                                This derivation of the PC coefficients and variances as eigenvectors and
                              eigenvalues of a covariance matrix is standard, but Flury (1988, Section 2.2)
                              and Diamantaras and Kung (1996, Chapter 3) give alternative derivations
                              that do not involve differentiation.
                                It should be noted that sometimes the vectors α k are referred to
                              as ‘principal components.’ This usage, though sometimes defended (see
                              Dawkins (1990), Kuhfeld (1990) for some discussion), is confusing. It is
                              preferable to reserve the term ‘principal components’ for the derived vari-
                              ables α x, and refer to α k as the vector of coefficients or loadings for the

                                    k
                              kth PC. Some authors distinguish between the terms ‘loadings’ and ‘coef-
                              ficients,’ depending on the normalization constraint used, but they will be
                              used interchangeably in this book.
                              1.2 A Brief History of Principal Component
                                    Analysis
                              The origins of statistical techniques are often difficult to trace. Preisendor-
                              fer and Mobley (1988) note that Beltrami (1873) and Jordan (1874)
   32   33   34   35   36   37   38   39   40   41   42