Page 426 - Jolliffe I. Principal Component Analysis

P. 426

14.2. Weights, Metrics, Transformations and Centerings
391
clear which of ‘sites’ and ‘species’ should be treated as ‘variables’ and which
as ‘observations.’ Another possibility is to centre with respect to sites, but
not species, in other words, carrying out an analysis with sites rather than
species as the variables. Buckland and Anderson (1985) analyse their data
in this way.
Yet another technique which has been suggested for analysing some
types of site-species data is correspondence analysis (see, for example, Sec-
tion 5.4.1 and Gauch, 1982). As pointed out in Section 13.4, correspondence
analysis has some similarity to Mandel’s approach, and hence to doubly
centred PCA. In doubly centred PCA we analyse the residuals from an
additive model for row and column (site and species) eﬀects, whereas in
correspondence analysis the residuals from a multiplicative (independence)
model are considered.
Both uncentred and doubly centred PCA perform eigenanalyses on ma-
trices whose elements are not covariances or correlations, but which can
still be viewed as measures of similarity or association between pairs of
variables. Another technique in the same vein is proposed by Elmore and
Richman (2001). Their idea is to ﬁnd ‘distances’ between variables which
can then be converted into similarities and an eigenanalysis done on the
resulting similarity matrix. Although Elmore and Richman (2001) note
a number of possible distance measures, they concentrate on Euclidean
distance, so that the distance d jk between variables j and k is

' 1
& n
2 2
(x ij − x ik ) .
i=1
2
If D is largest of the p d jk , the corresponding similarity matrix is deﬁned
to have elements

s jk =1 − d jk .
D
The procedure is referred to as PCA based on ES (Euclidean similarity).
There is an apparent connection with principal coordinate analysis (Sec-
tion 5.2) but for ES-based PCA it is distances between variables, rather
than between observations, that are analysed.
The technique is only appropriate if the variables are all measured in the
same units—it makes no sense to compute a distance between a vector of
temperatures and a vector of heights, for example. Elmore and Richman
(2001) report that the method does better at ﬁnding known ‘modes’ in a
data set than PCA based on either a covariance or a correlation matrix.
However, as with uncentred and doubly centred PCA, it is much less clear
than it is for PCA what is optimized by the technique, and hence it is more
diﬃcult to know how to interpret its results.

421 422 423 424 425 426 427 428 429 430 431