Page 133 - Jolliffe I. Principal Component Analysis
P. 133
5. Graphical Representation of Data Using Principal Components
102
of X X, and hence to PCA. The variations discussed so far relate only to
choices within this classical plot, for example the choice of α in defining g i
and h j (5.3.2), the possible rescaling by a factor (n − 1) 1/2 and the form
of display (axes or arrowheads, concentration ellipses).
Gower and Hand (1996) describe many other variations. In particular,
they look at biplots related to multivariate techniques other than PCA,
including multidimensional scaling, canonical variate analysis, correspon-
dence analysis and multiple correspondence analysis. Gabriel (1995a,b) also
discusses biplots related to multivariate methods other than PCA, in partic-
ular multiple correspondence analysis and MANOVA (multivariate analysis
of variance).
A key distinction drawn by Gower and Hand (1996) is between interpo-
lation and prediction in a biplot. The former is concerned with determining
where in the diagram to place an observation, given its values on the mea-
sured variables. Prediction refers to estimating the values of these variables,
given the position of an observation in the plot. Both are straightforward
∗
for classical biplots—g is used for interpolation and 2 ˜x ij for prediction—
i
but become more complicated for other varieties of biplot. Gower and Hand
(1996, Chapter 7) describe a framework for generalized biplots that includes
most other versions as special cases. One important special case is that of
non-linear biplots. These will be discussed further in Section 14.1, which
describes a number of non-linear modifications of PCA. Similarly, discus-
sion of robust biplots, due to Daigle and Rivest (1992), will be deferred
until Section 10.4, which covers robust versions of PCA.
The discussion and examples of the classical biplot given above use an
unstandardized form of X and hence are related to covariance matrix PCA.
As noted in Section 2.3 and elsewhere, it is more usual, and often more
appropriate, to base PCA on the correlation matrix as in the examples
of Section 5.3.1. Corresponding biplots can be derived from the SVD of
˜
X, the column-centred data matrix whose jth column has been scaled by
dividing by the standard deviation of x j , j =1, 2,... ,p. Many aspects of
the biplot remain the same when the correlation, rather than covariance,
matrix is used. The main difference is in the positions of the h j . Recall that
if α = 0 is chosen, together with the scaling factor (n−1) 1/2 , then the length
∗
h h approximates the variance of x j . In the case of a correlation-based
∗
j
j
analysis, var(x j ) = 1 and the quality of the biplot approximation to the
jth variable by the point representing h can be judged by the closeness of
∗
j
h to the unit circle centred at the origin. For this reason, the unit circle is
∗
j
sometimes drawn on correlation biplots to assist in evaluating the quality of
the approximation (Besse, 1994a). Another property of correlation biplots
is that the squared distance between h j and h k is 2(1 − r jk ), where r jk is
the correlation between x j and x k . The squared distance between h and
∗
j
∗
h approximates this quantity.
k
An alternative to the covariance and correlation biplots is the coefficient
of variation biplot, due to Underhill (1990). As its name suggests, instead

