Page 275 - Jolliffe I. Principal Component Analysis
P. 275
10. Outlier Detection, Influential Observations and Robust Estimation
242
study of methods for detecting multivariate outliers. It did well compared to
other methods in some circumstances, particularly when there are multiple
outliers and p is not too large.
Before turning to examples, recall that an example in which outliers
are detected using PCs in a rather different way was given in Section 5.6.
In that example, Andrews’ curves (Andrews, 1972) were computed using
PCs and some of the observations stood out as different from the others
when plotted as curves. Further examination of these different observations
showed that they were indeed ‘outlying’ in some respects, compared to the
remaining observations.
10.1.1 Examples
In this section one example will be discussed in some detail, while three
others will be described more briefly.
Anatomical Measurements
A set of seven anatomical measurements on 28 students was discussed in
Section 5.1.1 and it was found that on a plot of the first two PCs (Fig-
ures 1.3, 5.1) there was an extreme observation on the second PC. When
the measurements of this individual were examined in detail, it was found
that he had an anomalously small head circumference. Whereas the other
27 students all had head girths in the narrow range 21–24 cm, this student
(no. 16) had a measurement of 19 cm. It is impossible to check whether
this was an incorrect measurement or whether student 16 indeed had an
unusually small head (his other measurements were close to average), but
it is clear that this observation would be regarded as an ‘outlier’ according
to most definitions of the term.
This particular outlier is detected on the second PC, and it was sug-
gested above that any outliers detected by high-variance PCs are usually
detectable on examination of individual variables; this is indeed the case
here. Another point concerning this observation is that it is so extreme on
the second PC that it may be suspected that it alone is largely responsible
for the direction of this PC. This question will be investigated at the end
of Section 10.2, which deals with influential observations.
Figure 1.3 indicates one other possible outlier at the extreme left of
the diagram. This turns out to be the largest student in the class—190
cm (6 ft 3 in) tall, with all measurements except head girth at least as
large as all other 27 students. There is no suspicion here of any incorrect
measurements.
Turning now to the last few PCs, we hope to detect any observations
which are ‘outliers’ with respect to the correlation structure of the data.
Figure 10.3 gives a plot of the scores of the observations for the last two
2
PCs, and Table 10.1 gives the values of d , d 2 and d 4i , defined in equa-
1i 2i
tions (10.1.1), (10.1.2) and (10.1.4), respectively, for the six ‘most extreme’

