Page 268 - Jolliffe I. Principal Component Analysis
P. 268

10.1. Detection of Outliers Using Principal Components
                                                                                            235
                              variables, and will often be extreme with respect to one or both of these
                              variables looked at individually.
                                By contrast, the last few PCs may detect outliers that are not apparent
                              with respect to the original variables. A strong correlation structure be-
                              tween variables implies that there are linear functions of the variables with
                              small variances compared to the variances of the original variables. In the
                              simple height-and-weight example described above, height and weight have
                              a strong positive correlation, so it is possible to write
                                                        x 2 = βx 1 + ε,
                              where x 1 ,x 2 are height and weight measured about their sample means,
                              β is a positive constant, and ε is a random variable with a much smaller
                              variance than x 1 or x 2 . Therefore the linear function
                                                          x 2 − βx 1
                              has a small variance, and the last (in this case the second) PC in an analysis
                              of x 1 ,x 2 has a similar form, namely a 22 x 2 − a 12 x 1 , where a 12 ,a 22 > 0.
                              Calculation of the value of this second PC for each observation will detect
                              observations such as (175 cm, 25 kg) that are outliers with respect to the
                              correlation structure of the data, though not necessarily with respect to
                              individual variables. Figure 10.2 shows a plot of the data from Figure 10.1,
                              with respect to the PCs derived from the correlation matrix. The outlying
                              observation is ‘average’ for the first PC, but very extreme for the second.
                                This argument generalizes readily when the number of variables p is
                              greater than two; by examining the values of the last few PCs, we may be
                              able to detect observations that violate the correlation structure imposed
                              by the bulk of the data, but that are not necessarily aberrant with respect
                              to individual variables. Of course, it is possible that, if the sample size is
                              relatively small or if a few observations are sufficiently different from the
                              rest, then the outlier(s) may so strongly influence the last few PCs that
                              these PCs now reflect mainly the position of the outlier(s) rather than the
                              structure of the majority of the data. One way of avoiding this masking
                              or camouflage of outliers is to compute PCs leaving out one (or more)
                              observations and then calculate for the deleted observations the values of
                              the last PCs based on the reduced data set. To do this for each observation
                              is a heavy computational burden, but it might be worthwhile in small
                              samples where such camouflaging is, in any case, more likely to occur.
                              Alternatively, if PCs are estimated robustly (see Section 10.4), then the
                              influence of outliers on the last few PCs should be reduced and it may be
                              unnecessary to repeat the analysis with each observation deleted.
                                A series of scatterplots of pairs of the first few and last few PCs may
                              be useful in identifying possible outliers. One way of presentating each PC
                              separately is as a set of parallel boxplots. These have been suggested as a
                              means of deciding how many PCs to retain (see Section 6.1.5), but they
                              may also be useful for flagging potential outliers (Besse, 1994).
   263   264   265   266   267   268   269   270   271   272   273