Page 137 - Jolliffe I. Principal Component Analysis

P. 137

5. Graphical Representation of Data Using Principal Components
106
sites in the south west of Ireland, and the group {43, 168, 169, 171, 172} in
the bottom right of the diagram are all coastal sites in the south and east.
If we look at species, rather than sites, we ﬁnd that similar species tend
to be located in the same part of Figure 5.6. For example, three of the
four species of goose which were recorded are in the bottom-right of the
diagram (BG, WG, GG).
Turning to the simultaneous positions of species and sites, the Grey-
lag Goose (GG) and Barnacle Goose (BG) were only recorded at site
171, among those sites which are numbered on Figure 5.6. On the plot,
site 171 is closest in position of any site to the positions of these two
species. The Whitefronted Goose (WG) is recorded at sites 171 and 172
only, the Gadwall (GA) at sites 43, 103, 168, 169, 172 among those la-
belled on the diagram, and the Common Sandpiper (CS) at all sites in
the coastal group {43, 168, 169, 171, 172}, but at only one of the inland
group {50, 53, 103, 155, 156, 235}. Again, these occurrences might be pre-
dicted from the relative positions of the sites and species on the plot.
However, simple predictions are not always valid, as the Coot (CO), whose
position on the plot is in the middle of the inland sites, is recorded at all
11 sites numbered on the ﬁgure.

5.5 Comparisons Between Principal Coordinates,
Biplots, Correspondence Analysis and Plots
Based on Principal Components

For most purposes there is little point in asking which of the graphical
techniques discussed so far in this chapter is ‘best.’ This is because they are
either equivalent, as is the case of PCs and principal coordinates for some
types of similarity matrix, so any comparison is trivial, or the data set is of
a type such that one or more of the techniques are not really appropriate,
and so should not be compared with the others. For example, if the data
are in the form of a contingency table, then correspondence analysis is
clearly relevant, but the use of the other techniques is more questionable.
As demonstrated by Gower and Hand (1996) and Gabriel (1995a,b), the
biplot is not restricted to ‘standard’ (n × p) data matrices, and could be
used on any two-way array of data. The simultaneous positions of the g ∗
i
∗
and h still have a similar interpretation to that discussed in Section 5.3,
j
even though some of the separate properties of the g and h , for instance,
∗
∗
i j
those relating to variances and covariances, are clearly no longer valid. A
contingency table could also be analysed by PCA, but this is not really
appropriate, as it is not at all clear what interpretation could be given
to the results. Principal coordinate analysis needs a similarity or distance
matrix, so it is hard to see how it could be used directly on a contingency
table.

132 133 134 135 136 137 138 139 140 141 142