Page 271 - Jolliffe I. Principal Component Analysis
P. 271

10. Outlier Detection, Influential Observations and Robust Estimation
                              238
                                  2
                              so D is
                                  i
                                                                        −2
                                                −1

                                                                          A A(z i − ¯ z)

                                                  (x i − ¯ x)=(z i − ¯ z) A AL
                                       (x i − ¯ x) S

                                                          =(z i − ¯ z) L −2 (z i − ¯ z)

                                                             p  2
                                                               z
                                                          =     ik  ,
                                                            k=1  l k
                              where z ik is the kth PC score for the ith observation, measured about the
                              mean of the scores for all observations. Flury (1997, p. 609-610) suggests
                                                  2
                                             2
                              that a plot of (D − d ) versus d 2  will reveal observations that are not
                                             i    2i        2i
                              well represented by the first (p − q) PCs. Such observations are potential
                              outliers.
                                Gnanadesikan and Kettenring (1972) consider also the statistic
                                                              p

                                                                   2
                                                        d 2  =  l k z ,                 (10.1.3)
                                                         3i       ik
                                                             k=1
                              which emphasizes observations that have a large effect on the first few
                              PCs, and is equivalent to (x i − ¯ x) S(x i − ¯ x). As stated earlier, the first

                              few PCs are useful in detecting some types of outlier, and d 2  emphasizes
                                                                                  3i
                              such outliers. However, we repeat that such outliers are often detectable
                              from plots of the original variables, unlike the outliers exposed by the last
                              few PCs. Various types of outlier, including some that are extreme with
                              respect to both the first few and and the last few PCs, are illustrated in
                              the examples given later in this section.
                                Hawkins (1974) prefers to use d 2  with q< p rather than q = p (again, in
                                                           2i
                              order to emphasize the low-variance PCs), and he considers how to choose
                              an appropriate value for q. This is a rather different problem from that
                              considered in Section 6.1, as we now wish to decide how many of the PCs,
                              starting with the last rather than starting with the first, need to be retained.
                              Hawkins (1974) suggests three possibilities for choosing q, including the
                              ‘opposite’ of Kaiser’s rule (Section 6.1.2)—that is, the retention of PCs with
                              eigenvalues less than unity. In an example, he selects q as a compromise
                              between values suggested by his three rules.
                                Hawkins (1974) also shows that outliers can be successfully detected
                              using the statistic
                                                                     ∗
                                                     d 4i =  max   |z |,                (10.1.4)
                                                                     ik
                                                          p−q+1≤k≤p
                              and similar methods for choosing q are again suggested. Fellegi (1975),
                              too, is enthusiastic about the performance of the statistic d 4i . Hawkins
                              and Fatti (1984) claim that outlier detection is improved still further by
                              a series of transformations, including varimax rotation (see Sections 7.2
                              and 11.1), before computing d 4i . The test statistic for the ith observation
                              then becomes the maximum absolute value of the last q renormalized and
                              rotated PCs evaluated for that observation.
   266   267   268   269   270   271   272   273   274   275   276