Page 437 - Jolliffe I. Principal Component Analysis

P. 437

14. Generalizations and Adaptations of Principal Component Analysis
402
generality, that x 1 ≤ x 2 ≤ ... ≤ x n , and define the sample distribution
function as F n (x)= i/n for x i ≤ x<x i+1 ,i =0, 1,... ,n, where x 0 ,
x n+1 are defined as 0, 1 respectively. Then a well-known test statistic is the
Cramér-von Mises statistic:
) 1
2
2
W = n (F n (x) − x) dx.
n
0
2
Like most all-purpose goodness-of-fit statistics, W can detect many differ-
n
ent types of discrepancy between the observations and G(y); a large value
2
of W on its own gives no information about what type has occurred. For
n
this reason a number of authors, for example Durbin and Knott (1972),
2
Durbin et al. (1975), have looked at decompositions of W into a number
n
of separate ‘components,’ each of which measures the degree to which a
different type of discrepancy is present.
2
It turns out that a ‘natural’ way of partitioning W is (Durbin and Knott,
n
1972)
∞
2
2
W = z ,
n nk
k=1
where
) 1
z nk =(2n) 1/2 (F n (x) − x)sin (kπx) dx, k =1, 2,... ,
0
√ √
are the PCs of n(F n (x) − x). The phrase ‘PCs of n(F n (x) − x)’ needs
√
further explanation, since n(F n (x) − x) is not, as is usual when defining
PCs, a p-variable vector. Instead, it is an infinite-dimensional random vari-
able corresponding to the continuum of values for x between zero and one.
Durbin and Knott (1972) solve an equation of the form (12.3.1) to obtain
eigenfunctions a k (x), and hence corresponding PCs
) 1
√
z nk = n a k (x)(F n (x) − x) dx,
0
√
where a k (x)= 2sin(kπx).
The components z nk ,k =1, 2, ... are discussed in considerable detail,
from both theoretical and practical viewpoints, by Durbin and Knott
(1972), and Durbin et al. (1975), who also give several additional references
for the topic.
Another use of PCA in goodness-of-fit testing is noted by Jackson (1991,
Section 14.3), namely using an extension to the multivariate case of the
Shapiro-Wilk test for normality, based on PCs rather than on the origi-
nal variables. Kaigh (1999) also discusses something described as ‘principal
components’ in the context of goodness-of-fit, but these appear to be related
to Legendre polynomials, rather than being the usual variance-maximizing
PCs.

432 433 434 435 436 437 438 439 440 441 442