Page 425 - Jolliffe I. Principal Component Analysis
P. 425

14. Generalizations and Adaptations of Principal Component Analysis
                              390
                                Reyment and J¨oreskog (1993, Section 8.7) discuss an application of the
                              method (which they refer to as Imbrie’s Q-mode method) in a similar con-
                              text concerning the abundance of various marine micro-organisms in cores
                              taken at a number of sites on the seabed. The same authors also suggest
                              that this type of analysis is relevant for data where the p variables are
                              amounts of p chemical constituents in n soil or rock samples. If the degree
                              to which two samples have the same proportions of each constituent is con-
                              sidered to be an important index of similarity between samples, then the
                              similarity measure implied by non-centred PCA is appropriate (Reyment
                              and J¨oreskog, 1993, Section 5.4). An alternative approach if proportions are
                              of interest is to reduce the data to compositional form (see Section 13.3).
                                The technique of empirical orthogonal teleconnections (van den Dool
                              et al., 2000), described in Section 11.2.3, operates on uncentred data.
                              Here matters are confused by referring to uncentred sums of squares and
                              cross-products as ‘variances’ and ‘correlations.’ Devijver and Kittler (1982,
                              Section 9.3) use similar misleading terminology in a population derivation
                              and discussion of uncentred PCA.
                                Doubly centred PCA was proposed by Buckland and Anderson (1985) as
                              another method of analysis for data that consist of species counts at various
                              sites. They argue that centred PCA of such data may be dominated by
                              a ‘size’ component, which measures the relative abundance of the various
                              species. It is possible to simply ignore the first PC, and concentrate on later
                              PCs, but an alternative is provided by double centering, which ‘removes’ the
                              ‘size’ PC. The same idea has been suggested in the analysis of size/shape
                              data (see Section 13.2). Double centering introduces a component with zero
                              eigenvalue, because the constraint x i1 +x i2 +...+x ip = 0 now holds for all i.
                              A further alternative for removing the ‘size’ effect of different abundances
                              of different species is, for some such data sets, to record only whether a
                              species is present or absent at each site, rather than the actual counts for
                              each species.
                                In fact, what is being done in double centering is the same as Mandel’s
                              (1971, 1972) approach to data in a two-way analysis of variance (see Sec-
                              tion 13.4). It removes main effects due to rows/observations/sites, and due
                              to columns/variables/species, and concentrates on the interaction between
                              species and sites. In the regression context, Hoerl et al. (1985) suggest
                              that double centering can remove ‘non-essential ill-conditioning,’ which is
                              caused by the presence of a row (observation) effect in the original data.
                              Kazmierczak (1985) advocates a logarithmic transformation of data, fol-
                              lowed by double centering. This gives a procedure that is invariant to pre-
                              and post-multiplication of the data matrix by diagonal matrices. Hence it
                              is invariant to different weightings of observations and to different scalings
                              of the variables.
                                One reason for the suggestion of both non-centred and doubly-centred
                              PCA for counts of species at various sites is perhaps that it is not entirely
   420   421   422   423   424   425   426   427   428   429   430