Page 270 - Jolliffe I. Principal Component Analysis
P. 270

10.1. Detection of Outliers Using Principal Components
                                                  2
                              possible test statistic, d , suggested by Rao (1964) and discussed further
                                                  1i
                              by Gnanadesikan and Kettenring (1972), is the sum of squares of the values
                              of the last q (<p) PCs, that is                               237
                                                               p
                                                       2            2
                                                      d  =         z ,                  (10.1.1)
                                                       1i           ik
                                                           k=p−q+1
                              where z ik is the value of the kth PC for the ith observation. The statis-
                                  2
                              tics d ,i =1, 2,... ,n should, approximately, be independent observations
                                  1i
                              from a gamma distribution if there are no outliers, so that a gamma prob-
                              ability plot with suitably estimated shape parameter may expose outliers
                              (Gnanadesikan and Kettenring, 1972).
                                A possible criticism of the statistic d 2  is that it still gives insufficient
                                                                 1i
                              weight to the last few PCs, especially if q, the number of PCs contributing to
                               2
                              d , is close to p. Because the PCs have decreasing variance with increasing
                               1i
                              index, the values of z 2 ik  will typically become smaller as k increases, and
                              d 2  therefore implicitly gives the PCs decreasing weight as k increases. This
                               1i
                              effect can be severe if some of the PCs have very small variances, and this
                              is unsatisfactory as it is precisely the low-variance PCs which may be most
                              effective in determining the presence of certain types of outlier.
                                An alternative is to give the components equal weight, and this can be
                                                                1/2
                              achieved by replacing z ik by z ∗  = z ik /l  , where l k is the variance of the
                                                        ik      k
                              kth sample PC. In this case the sample variances of the z ∗  will all be equal
                                                                               ik
                              to unity. Hawkins (1980, Section 8.2) justifies this particular renormaliza-
                              tion of the PCs by noting that the renormalized PCs, in reverse order,
                                                              ˜
                                                                            ˜
                              are the uncorrelated linear functions a x, a ˜ p−1 x,..., a x of x which, when
                                                                            ˜
                                                              ˜
                                                                  ˜
                                                                             1
                                                               p
                                                                            ˜
                              constrained to have unit variances, have coefficients ˜a jk that successively

                                                       ˜ a ,for k = p, (p − 1),..., 1. Maximization of
                              maximize the criterion  p  ˜ 2
                                                    j=1  jk
                              this criterion is desirable because, given the fixed-variance property, linear
                              functions that have large absolute values for their coefficients are likely to be
                              more sensitive to outliers than those with small coefficients (Hawkins,1980,
                              Section 8.2). It should be noted that when q = p, the statistic
                                                               p    2
                                                                   z
                                                       2
                                                      d  =          ik                  (10.1.2)
                                                       2i
                                                           k=p−q+1  l k
                                            2
                              becomes    p  z /l k , which is simply the (squared) Mahalanobis distance
                                        k=1 ik
                                                                                           2
                               2
                              D between the ith observation and the sample mean, defined as D =
                               i                                                           i
                                       −1
                                                                            2
                              (x i − ¯ x) S  (x i − ¯ x). This follows because S = AL A where, as usual,

                               2
                              L is the diagonal matrix whose kth diagonal element is l k ,and A is the
                              matrix whose (j, k)th element is a jk . Furthermore,
                                                       S −1  = AL −2 A
                                                         x = z A


                                                          i   i
                                                         ¯ x = ¯ z A ,
   265   266   267   268   269   270   271   272   273   274   275