Page 271 - Jolliffe I. Principal Component Analysis
P. 271
10. Outlier Detection, Influential Observations and Robust Estimation
238
2
so D is
i
−2
−1
A A(z i − ¯ z)
(x i − ¯ x)=(z i − ¯ z) A AL
(x i − ¯ x) S
=(z i − ¯ z) L −2 (z i − ¯ z)
p 2
z
= ik ,
k=1 l k
where z ik is the kth PC score for the ith observation, measured about the
mean of the scores for all observations. Flury (1997, p. 609-610) suggests
2
2
that a plot of (D − d ) versus d 2 will reveal observations that are not
i 2i 2i
well represented by the first (p − q) PCs. Such observations are potential
outliers.
Gnanadesikan and Kettenring (1972) consider also the statistic
p
2
d 2 = l k z , (10.1.3)
3i ik
k=1
which emphasizes observations that have a large effect on the first few
PCs, and is equivalent to (x i − ¯ x) S(x i − ¯ x). As stated earlier, the first
few PCs are useful in detecting some types of outlier, and d 2 emphasizes
3i
such outliers. However, we repeat that such outliers are often detectable
from plots of the original variables, unlike the outliers exposed by the last
few PCs. Various types of outlier, including some that are extreme with
respect to both the first few and and the last few PCs, are illustrated in
the examples given later in this section.
Hawkins (1974) prefers to use d 2 with q< p rather than q = p (again, in
2i
order to emphasize the low-variance PCs), and he considers how to choose
an appropriate value for q. This is a rather different problem from that
considered in Section 6.1, as we now wish to decide how many of the PCs,
starting with the last rather than starting with the first, need to be retained.
Hawkins (1974) suggests three possibilities for choosing q, including the
‘opposite’ of Kaiser’s rule (Section 6.1.2)—that is, the retention of PCs with
eigenvalues less than unity. In an example, he selects q as a compromise
between values suggested by his three rules.
Hawkins (1974) also shows that outliers can be successfully detected
using the statistic
∗
d 4i = max |z |, (10.1.4)
ik
p−q+1≤k≤p
and similar methods for choosing q are again suggested. Fellegi (1975),
too, is enthusiastic about the performance of the statistic d 4i . Hawkins
and Fatti (1984) claim that outlier detection is improved still further by
a series of transformations, including varimax rotation (see Sections 7.2
and 11.1), before computing d 4i . The test statistic for the ith observation
then becomes the maximum absolute value of the last q renormalized and
rotated PCs evaluated for that observation.

