Page 265 - Jolliffe I. Principal Component Analysis
P. 265

10


                              Outlier Detection, Influential
                              Observations, Stability, Sensitivity,

                              and Robust Estimation of Principal
                              Components













                              This chapter deals with four related topics, which are all concerned with
                              situations where some of the observations may, in some way, be atypical of
                              the bulk of the data.
                                First, we discuss the problem of detecting outliers in a set of data. Out-
                              liers are generally viewed as observations that are a long way from, or
                              inconsistent with, the remainder of the data. Such observations can, but
                              need not, have a drastic and disproportionate effect on the results of var-
                              ious analyses of a data set. Numerous methods have been suggested for
                              detecting outliers (see, for example, Barnett and Lewis, 1994; Hawkins,
                              1980); some of the methods use PCs, and these methods are described in
                              Section 10.1.
                                The techniques described in Section 10.1 are useful regardless of the type
                              of statistical analysis to be performed, but in Sections 10.2–10.4 we look
                              specifically at the case where a PCA is being done. Depending on their
                              position, outlying observations may or may not have a large effect on the
                              results of the analysis. It is of interest to determine which observations do
                              indeed have a large effect. Such observations are called influential observa-
                              tions and are discussed in Section 10.2. Leaving out an observation is one
                              type of perturbation to a data set. Sensitivity and stability of PCA with
                              respect to other types of perturbation is the subject of Section 10.3.
                                Given that certain observations are outliers or influential, it may be
                              desirable to adapt the analysis to remove or diminish the effects of such
                              observations; that is, the analysis is made robust. Robust analyses have
                              been developed in many branches of statistics (see, for example, Huber
                              (1981); Hampel et al. (1986) for some of the theoretical background, and
   260   261   262   263   264   265   266   267   268   269   270