Page 137 - Jolliffe I. Principal Component Analysis
P. 137

5. Graphical Representation of Data Using Principal Components
                              106
                              sites in the south west of Ireland, and the group {43, 168, 169, 171, 172} in
                              the bottom right of the diagram are all coastal sites in the south and east.
                                If we look at species, rather than sites, we find that similar species tend
                              to be located in the same part of Figure 5.6. For example, three of the
                              four species of goose which were recorded are in the bottom-right of the
                              diagram (BG, WG, GG).
                                Turning to the simultaneous positions of species and sites, the Grey-
                              lag Goose (GG) and Barnacle Goose (BG) were only recorded at site
                              171, among those sites which are numbered on Figure 5.6. On the plot,
                              site 171 is closest in position of any site to the positions of these two
                              species. The Whitefronted Goose (WG) is recorded at sites 171 and 172
                              only, the Gadwall (GA) at sites 43, 103, 168, 169, 172 among those la-
                              belled on the diagram, and the Common Sandpiper (CS) at all sites in
                              the coastal group {43, 168, 169, 171, 172}, but at only one of the inland
                              group {50, 53, 103, 155, 156, 235}. Again, these occurrences might be pre-
                              dicted from the relative positions of the sites and species on the plot.
                              However, simple predictions are not always valid, as the Coot (CO), whose
                              position on the plot is in the middle of the inland sites, is recorded at all
                              11 sites numbered on the figure.


                              5.5 Comparisons Between Principal Coordinates,
                                    Biplots, Correspondence Analysis and Plots
                                    Based on Principal Components


                              For most purposes there is little point in asking which of the graphical
                              techniques discussed so far in this chapter is ‘best.’ This is because they are
                              either equivalent, as is the case of PCs and principal coordinates for some
                              types of similarity matrix, so any comparison is trivial, or the data set is of
                              a type such that one or more of the techniques are not really appropriate,
                              and so should not be compared with the others. For example, if the data
                              are in the form of a contingency table, then correspondence analysis is
                              clearly relevant, but the use of the other techniques is more questionable.
                              As demonstrated by Gower and Hand (1996) and Gabriel (1995a,b), the
                              biplot is not restricted to ‘standard’ (n × p) data matrices, and could be
                              used on any two-way array of data. The simultaneous positions of the g ∗
                                                                                             i
                                   ∗
                              and h still have a similar interpretation to that discussed in Section 5.3,
                                   j
                              even though some of the separate properties of the g and h , for instance,
                                                                                  ∗
                                                                           ∗
                                                                           i      j
                              those relating to variances and covariances, are clearly no longer valid. A
                              contingency table could also be analysed by PCA, but this is not really
                              appropriate, as it is not at all clear what interpretation could be given
                              to the results. Principal coordinate analysis needs a similarity or distance
                              matrix, so it is hard to see how it could be used directly on a contingency
                              table.
   132   133   134   135   136   137   138   139   140   141   142