Page 395 - Jolliffe I. Principal Component Analysis
P. 395

13. Principal Component Analysis for Special Types of Data
                              360
                                Cohn (1999) considers four test statistics for deciding the equivalence
                              or otherwise of subspaces defined by sets of q PCs derived from each of
                              two covariance matrices corresponding to two groups of observations. One
                              of the statistics is the likelihood ratio test used by Flury (1988) and two
                              others are functions of the eigenvalues, or corresponding cosines, derived
                              by Krzanowski (1979b). The fourth statistic is based on a sequence of two-
                              dimensional rotations from one subspace towards the other, but simulations
                              show it to be less reliable than the other three. There are a number of
                              novel aspects to Cohn’s (1999) study. The first is that the observations
                              within the two groups are not independent; in his motivating example the
                              data are serially correlated time series. To derive critical values for the test
                              statistics, a bootstrap procedure is used, with resampling in blocks because
                              of the serial correlation. The test statistics are compared in a simulation
                              study and on the motivating example.
                                Keramidas et al. (1987) suggest a graphical procedure for comparing
                              eigenvectors of several covariance matrices S 1 , S 2 ,..., S G . Much of the pa-
                              per is concerned with the comparison of a single eigenvector from each
                              matrix, either with a common predetermined vector or with a ‘typical’
                              vector that maximizes the sum of squared cosines between itself and the
                              G eigenvectors to be compared. If a gk is the kth eigenvector for the gth
                              sample covariance matrix, g =1, 2,... ,G,and a 0k is the predetermined or
                              typical vector, then distances
                                      2


                                    k δ = min[(a gk − a 0k ) (a gk − a 0k ), (a gk + a 0k ) (a gk + a 0k )]
                                      g
                              are calculated. If the sample covariance matrices are drawn from the same
                                               2
                              population, then k δ has an approximate gamma distribution, so Kerami-
                                               g
                              das et al. (1987) suggest constructing gamma Q-Q plots to detect differences
                              from this null situation. Simulations are given for both the null and non-
                              null cases. Such plots are likely to be more useful when G is large than
                              when there is only a handful of covariance matrices to be compared.
                                Keramidas et al. (1987) extend their idea to compare subspaces spanned
                              by two or more eigenvectors. For two subspaces, their overall measure of
                                                        2
                              similarity, which reduces to k δ when single eigenvectors are compared, is
                                                        g
                                                        1/2
                              the sum of the square roots ν  of eigenvalues of A A 2q A A 1q . Recall


                                                                            1q     2q
                                                        k
                              that Krzanowski (1979b) uses the sum of these eigenvalues, not their square
                              roots as his measure of overall similarity. Keramidas et al. (1987) stress
                              that individual eigenvectors or subspaces should only be compared when
                              their eigenvalues are well-separated from adjacent eigenvalues so that the
                              eigenvectors or subspaces are well-defined.
                                Ten Berge and Kiers (1996) take a different and more complex view of
                              common principal components than Flury (1988) or Krzanowski (1979b).
                              They refer to generalizations of PCA to G (≥ 2) groups of individuals, with
                              different generalizations being appropriate depending on what is taken as
                              the defining property of PCA. They give three different defining criteria and
   390   391   392   393   394   395   396   397   398   399   400