Page 265 - Jolliffe I. Principal Component Analysis
P. 265
10
Outlier Detection, Influential
Observations, Stability, Sensitivity,
and Robust Estimation of Principal
Components
This chapter deals with four related topics, which are all concerned with
situations where some of the observations may, in some way, be atypical of
the bulk of the data.
First, we discuss the problem of detecting outliers in a set of data. Out-
liers are generally viewed as observations that are a long way from, or
inconsistent with, the remainder of the data. Such observations can, but
need not, have a drastic and disproportionate effect on the results of var-
ious analyses of a data set. Numerous methods have been suggested for
detecting outliers (see, for example, Barnett and Lewis, 1994; Hawkins,
1980); some of the methods use PCs, and these methods are described in
Section 10.1.
The techniques described in Section 10.1 are useful regardless of the type
of statistical analysis to be performed, but in Sections 10.2–10.4 we look
specifically at the case where a PCA is being done. Depending on their
position, outlying observations may or may not have a large effect on the
results of the analysis. It is of interest to determine which observations do
indeed have a large effect. Such observations are called influential observa-
tions and are discussed in Section 10.2. Leaving out an observation is one
type of perturbation to a data set. Sensitivity and stability of PCA with
respect to other types of perturbation is the subject of Section 10.3.
Given that certain observations are outliers or influential, it may be
desirable to adapt the analysis to remove or diminish the effects of such
observations; that is, the analysis is made robust. Robust analyses have
been developed in many branches of statistics (see, for example, Huber
(1981); Hampel et al. (1986) for some of the theoretical background, and

