Page 395 - Jolliffe I. Principal Component Analysis
P. 395
13. Principal Component Analysis for Special Types of Data
360
Cohn (1999) considers four test statistics for deciding the equivalence
or otherwise of subspaces defined by sets of q PCs derived from each of
two covariance matrices corresponding to two groups of observations. One
of the statistics is the likelihood ratio test used by Flury (1988) and two
others are functions of the eigenvalues, or corresponding cosines, derived
by Krzanowski (1979b). The fourth statistic is based on a sequence of two-
dimensional rotations from one subspace towards the other, but simulations
show it to be less reliable than the other three. There are a number of
novel aspects to Cohn’s (1999) study. The first is that the observations
within the two groups are not independent; in his motivating example the
data are serially correlated time series. To derive critical values for the test
statistics, a bootstrap procedure is used, with resampling in blocks because
of the serial correlation. The test statistics are compared in a simulation
study and on the motivating example.
Keramidas et al. (1987) suggest a graphical procedure for comparing
eigenvectors of several covariance matrices S 1 , S 2 ,..., S G . Much of the pa-
per is concerned with the comparison of a single eigenvector from each
matrix, either with a common predetermined vector or with a ‘typical’
vector that maximizes the sum of squared cosines between itself and the
G eigenvectors to be compared. If a gk is the kth eigenvector for the gth
sample covariance matrix, g =1, 2,... ,G,and a 0k is the predetermined or
typical vector, then distances
2
k δ = min[(a gk − a 0k ) (a gk − a 0k ), (a gk + a 0k ) (a gk + a 0k )]
g
are calculated. If the sample covariance matrices are drawn from the same
2
population, then k δ has an approximate gamma distribution, so Kerami-
g
das et al. (1987) suggest constructing gamma Q-Q plots to detect differences
from this null situation. Simulations are given for both the null and non-
null cases. Such plots are likely to be more useful when G is large than
when there is only a handful of covariance matrices to be compared.
Keramidas et al. (1987) extend their idea to compare subspaces spanned
by two or more eigenvectors. For two subspaces, their overall measure of
2
similarity, which reduces to k δ when single eigenvectors are compared, is
g
1/2
the sum of the square roots ν of eigenvalues of A A 2q A A 1q . Recall
1q 2q
k
that Krzanowski (1979b) uses the sum of these eigenvalues, not their square
roots as his measure of overall similarity. Keramidas et al. (1987) stress
that individual eigenvectors or subspaces should only be compared when
their eigenvalues are well-separated from adjacent eigenvalues so that the
eigenvectors or subspaces are well-defined.
Ten Berge and Kiers (1996) take a different and more complex view of
common principal components than Flury (1988) or Krzanowski (1979b).
They refer to generalizations of PCA to G (≥ 2) groups of individuals, with
different generalizations being appropriate depending on what is taken as
the defining property of PCA. They give three different defining criteria and

