Page 138 - Jolliffe I. Principal Component Analysis
P. 138

5.6. Displaying Intrinsically High-Dimensional Data
                                                                                            107
                                There are a number of connections between PCA and the other
                              techniques–links with principal coordinate analysis and biplots have al-
                              ready been discussed, while those with correspondence analysis are deferred
                              until Section 13.1—but for most data sets one method is more appropriate
                              than the others. Contingency table data imply correspondence analysis,
                              and similarity or dissimilarity matrices suggest principal coordinate analy-
                              sis, whereas PCA is defined for ‘standard’ data matrices of n observations
                              on p variables. Notwithstanding these distinctions, different techniques
                              have been used on the same data sets and a number of empirical compar-
                              isons have been reported in the ecological literature. Digby and Kempton
                              (1987, Section 4.3) compare twelve ordination methods, including principal
                              coordinate analysis, with five different similarity measures and correspon-
                              dence analysis, on both species abundances and presence/absence data.
                              The comparison is by means of a second-level ordination based on simi-
                              larities between the results of the twelve methods. Gauch (1982, Chapter
                              4) discusses criteria for choosing an appropriate ordination technique for
                              ecological data, and in Gauch (1982, Chapter 3) a number of studies are
                              described which compare PCA with other techniques, including correspon-
                              dence analysis, on simulated data. The data are generated to have a similar
                              structure to that expected in some types of ecological data, with added
                              noise, and investigations are conducted to see which techniques are ‘best’
                              at recovering the structure. However, as with comparisons between PCA
                              and correspondence analysis given by Greenacre (1994, Section 9.6), the
                              relevance to the data analysed of all the techniques compared is open to
                              debate. Different techniques implicitly assume that different types of struc-
                              ture or model are of interest for the data (see Section 14.2.3 for some further
                              possibilities) and which technique is most appropriate will depend on which
                              type of structure or model is relevant.



                              5.6 Methods for Graphical Display of Intrinsically
                                    High-Dimensional Data


                              Sometimes it will not be possible to reduce a data set’s dimensionality
                              to two or three without a substantial loss of information; in such cases,
                              methods for displaying many variables simultaneously in two dimensions
                              may be useful. Plots of trigonometric functions due to Andrews (1972),
                              illustrated below, and the display in terms of faces suggested by Chernoff
                              (1973), for which several examples are given in Wang (1978), became pop-
                              ular in the 1970s and 1980s. There are many other possibilities (see, for
                              example, Tukey and Tukey (1981) and Carr(1998)) which will not be dis-
                              cussed here. Recent developments in the visualization of high-dimensional
                              data using the ever-increasing power of computers have created displays
                              which are dynamic, colourful and potentially highly informative, but there
   133   134   135   136   137   138   139   140   141   142   143