Page 155 - Jolliffe I. Principal Component Analysis
P. 155

6. Choosing a Subset of Principal Components or Variables
                              124
                              for large n, PRESS(m)and W are almost equivalent to the much simpler
                                         k=m+1 k and
                                               l

                              quantities
                                         p
                                                                   ,
                                                             l m

                                                                 l
                                                           p
                                                           k=m+1 k
                              respectively. However, Gabriel (personal communication) notes that this
                              conclusion holds only for large sample sizes.
                                In Section 3.9 we introduced the fixed effects model. A number of authors
                              have used this model as a basis for constructing rules to determine m,
                              with some of the rules relying on the resampling ideas associated with the
                              bootstrap and jackknife. Recall that the model assumes that the rows x i of
                              the data matrix are such that E(x i )= z i , where z i lies in a q-dimensional
                                                                                           2
                              space F q .If e i is defined as (x i − z i ), then E(e i )= 0 and var(e i )=  σ  Γ,
                                                                                          w i
                              where Γ is a positive definite symmetric matrix and the w i are positive
                              scalars whose sum is unity. For fixed q, the quantity
                                                       n
                                                                   2

                                                         w i  x i − z i    ,             (6.1.6)
                                                                   M
                                                      i=1
                                                                                       2
                              given in equation (3.9.1), is to be minimized in order to estimate σ ,the z i
                              and F q (Γ and the w i are assumed known). The current selection problem
                              is not only to estimate the unknown parameters, but also to find q.We
                              wish our choice of m, the number of components retained, to coincide with
                              the true value of q, assuming that such a value exists.
                                To choose m,Ferr´e (1990) attempts to find q so that it minimizes the
                              loss function
                                                          n
                                                                      2
                                                  f q = E[  w i  z i − ˆ z i    −1],     (6.1.7)
                                                                      Γ
                                                         i=1
                              where ˆ z i is the projection of x i onto F q . The criterion f q cannot be calcu-
                              lated, but must be estimated, and Ferr´e (1990) shows that a good estimate
                              of f q is
                                                                                        ˆ
                                    p                                        q   p

                                        ˆ

                              ˆ
                                             2
                              f q =     λ k + σ [2q(n + q − p) − np +2(p − q)+4         λ l  ],
                                                                                          ˆ
                                                                                      ˆ
                                                                                     (λ l − λ k )
                                  k=q+1                                     l=1 k=q+1
                                                                                         (6.1.8)
                                    ˆ
                              where λ k is the kth largest eigenvalue of VΓ −1  and
                                                       p

                                                  V =    w i (x i − ¯ x)(x i − ¯ x) .

                                                      i=1
                                In the special case where Γ = I p and w i =  1 , i =1,...,n,wehave
                                                                        n
                                 −1   (n−1)      ˆ    (n−1)
                              VΓ    =      S,and λ k =     l k , where l k is the kth largest eigenvalue
                                        n               n
                              of the sample covariance matrix S. In addition, ˆ z i is the projection of x i
                                                                                         2
                              onto the space spanned by the first q PCs. The residual variance σ still
   150   151   152   153   154   155   156   157   158   159   160