Page 425 - Jolliffe I. Principal Component Analysis
P. 425
14. Generalizations and Adaptations of Principal Component Analysis
390
Reyment and J¨oreskog (1993, Section 8.7) discuss an application of the
method (which they refer to as Imbrie’s Q-mode method) in a similar con-
text concerning the abundance of various marine micro-organisms in cores
taken at a number of sites on the seabed. The same authors also suggest
that this type of analysis is relevant for data where the p variables are
amounts of p chemical constituents in n soil or rock samples. If the degree
to which two samples have the same proportions of each constituent is con-
sidered to be an important index of similarity between samples, then the
similarity measure implied by non-centred PCA is appropriate (Reyment
and J¨oreskog, 1993, Section 5.4). An alternative approach if proportions are
of interest is to reduce the data to compositional form (see Section 13.3).
The technique of empirical orthogonal teleconnections (van den Dool
et al., 2000), described in Section 11.2.3, operates on uncentred data.
Here matters are confused by referring to uncentred sums of squares and
cross-products as ‘variances’ and ‘correlations.’ Devijver and Kittler (1982,
Section 9.3) use similar misleading terminology in a population derivation
and discussion of uncentred PCA.
Doubly centred PCA was proposed by Buckland and Anderson (1985) as
another method of analysis for data that consist of species counts at various
sites. They argue that centred PCA of such data may be dominated by
a ‘size’ component, which measures the relative abundance of the various
species. It is possible to simply ignore the first PC, and concentrate on later
PCs, but an alternative is provided by double centering, which ‘removes’ the
‘size’ PC. The same idea has been suggested in the analysis of size/shape
data (see Section 13.2). Double centering introduces a component with zero
eigenvalue, because the constraint x i1 +x i2 +...+x ip = 0 now holds for all i.
A further alternative for removing the ‘size’ effect of different abundances
of different species is, for some such data sets, to record only whether a
species is present or absent at each site, rather than the actual counts for
each species.
In fact, what is being done in double centering is the same as Mandel’s
(1971, 1972) approach to data in a two-way analysis of variance (see Sec-
tion 13.4). It removes main effects due to rows/observations/sites, and due
to columns/variables/species, and concentrates on the interaction between
species and sites. In the regression context, Hoerl et al. (1985) suggest
that double centering can remove ‘non-essential ill-conditioning,’ which is
caused by the presence of a row (observation) effect in the original data.
Kazmierczak (1985) advocates a logarithmic transformation of data, fol-
lowed by double centering. This gives a procedure that is invariant to pre-
and post-multiplication of the data matrix by diagonal matrices. Hence it
is invariant to different weightings of observations and to different scalings
of the variables.
One reason for the suggestion of both non-centred and doubly-centred
PCA for counts of species at various sites is perhaps that it is not entirely

