Page 426 - Jolliffe I. Principal Component Analysis
P. 426

14.2. Weights, Metrics, Transformations and Centerings
                                                                                            391
                              clear which of ‘sites’ and ‘species’ should be treated as ‘variables’ and which
                              as ‘observations.’ Another possibility is to centre with respect to sites, but
                              not species, in other words, carrying out an analysis with sites rather than
                              species as the variables. Buckland and Anderson (1985) analyse their data
                              in this way.
                                Yet another technique which has been suggested for analysing some
                              types of site-species data is correspondence analysis (see, for example, Sec-
                              tion 5.4.1 and Gauch, 1982). As pointed out in Section 13.4, correspondence
                              analysis has some similarity to Mandel’s approach, and hence to doubly
                              centred PCA. In doubly centred PCA we analyse the residuals from an
                              additive model for row and column (site and species) effects, whereas in
                              correspondence analysis the residuals from a multiplicative (independence)
                              model are considered.
                                Both uncentred and doubly centred PCA perform eigenanalyses on ma-
                              trices whose elements are not covariances or correlations, but which can
                              still be viewed as measures of similarity or association between pairs of
                              variables. Another technique in the same vein is proposed by Elmore and
                              Richman (2001). Their idea is to find ‘distances’ between variables which
                              can then be converted into similarities and an eigenanalysis done on the
                              resulting similarity matrix. Although Elmore and Richman (2001) note
                              a number of possible distance measures, they concentrate on Euclidean
                              distance, so that the distance d jk between variables j and k is

                                                                    ' 1
                                                      & n
                                                                   2  2
                                                          (x ij − x ik )  .
                                                       i=1
                                                 2
                              If D is largest of the p d jk , the corresponding similarity matrix is defined
                              to have elements

                                                        s jk =1 −  d jk  .
                                                                 D
                              The procedure is referred to as PCA based on ES (Euclidean similarity).
                              There is an apparent connection with principal coordinate analysis (Sec-
                              tion 5.2) but for ES-based PCA it is distances between variables, rather
                              than between observations, that are analysed.
                                The technique is only appropriate if the variables are all measured in the
                              same units—it makes no sense to compute a distance between a vector of
                              temperatures and a vector of heights, for example. Elmore and Richman
                              (2001) report that the method does better at finding known ‘modes’ in a
                              data set than PCA based on either a covariance or a correlation matrix.
                              However, as with uncentred and doubly centred PCA, it is much less clear
                              than it is for PCA what is optimized by the technique, and hence it is more
                              difficult to know how to interpret its results.
   421   422   423   424   425   426   427   428   429   430   431