Page 270 - Jolliffe I. Principal Component Analysis
P. 270
10.1. Detection of Outliers Using Principal Components
2
possible test statistic, d , suggested by Rao (1964) and discussed further
1i
by Gnanadesikan and Kettenring (1972), is the sum of squares of the values
of the last q (<p) PCs, that is 237
p
2 2
d = z , (10.1.1)
1i ik
k=p−q+1
where z ik is the value of the kth PC for the ith observation. The statis-
2
tics d ,i =1, 2,... ,n should, approximately, be independent observations
1i
from a gamma distribution if there are no outliers, so that a gamma prob-
ability plot with suitably estimated shape parameter may expose outliers
(Gnanadesikan and Kettenring, 1972).
A possible criticism of the statistic d 2 is that it still gives insufficient
1i
weight to the last few PCs, especially if q, the number of PCs contributing to
2
d , is close to p. Because the PCs have decreasing variance with increasing
1i
index, the values of z 2 ik will typically become smaller as k increases, and
d 2 therefore implicitly gives the PCs decreasing weight as k increases. This
1i
effect can be severe if some of the PCs have very small variances, and this
is unsatisfactory as it is precisely the low-variance PCs which may be most
effective in determining the presence of certain types of outlier.
An alternative is to give the components equal weight, and this can be
1/2
achieved by replacing z ik by z ∗ = z ik /l , where l k is the variance of the
ik k
kth sample PC. In this case the sample variances of the z ∗ will all be equal
ik
to unity. Hawkins (1980, Section 8.2) justifies this particular renormaliza-
tion of the PCs by noting that the renormalized PCs, in reverse order,
˜
˜
are the uncorrelated linear functions a x, a ˜ p−1 x,..., a x of x which, when
˜
˜
˜
1
p
˜
constrained to have unit variances, have coefficients ˜a jk that successively
˜ a ,for k = p, (p − 1),..., 1. Maximization of
maximize the criterion p ˜ 2
j=1 jk
this criterion is desirable because, given the fixed-variance property, linear
functions that have large absolute values for their coefficients are likely to be
more sensitive to outliers than those with small coefficients (Hawkins,1980,
Section 8.2). It should be noted that when q = p, the statistic
p 2
z
2
d = ik (10.1.2)
2i
k=p−q+1 l k
2
becomes p z /l k , which is simply the (squared) Mahalanobis distance
k=1 ik
2
2
D between the ith observation and the sample mean, defined as D =
i i
−1
2
(x i − ¯ x) S (x i − ¯ x). This follows because S = AL A where, as usual,
2
L is the diagonal matrix whose kth diagonal element is l k ,and A is the
matrix whose (j, k)th element is a jk . Furthermore,
S −1 = AL −2 A
x = z A
i i
¯ x = ¯ z A ,

