Page 134 - Jolliffe I. Principal Component Analysis
P. 134
103
5.4. Correspondence Analysis
of dividing the jth column of X by the standard deviation of x j to give a
correlation biplot, here the jth column is divided by the mean of x j .Of
course, this only makes sense for certain types of non-negative variables,
but Underhill (1990) shows that for such variables the resulting biplot gives
a useful view of the data and variables. The cosines of the angles between
∗
the h still provide approximations to the correlations between variables,
j
∗
but the lengths of the vectors h now give information on the variability
j
of the x j relative to their means.
Finally, the biplot can be adapted to cope with missing values by in-
troducing weights w ij for each observation x ij when approximating x ij by
g h . A weight of zero is given to missing values and a unit weight to those
∗
∗
i j
∗
values which are present. The appropriate values for g , h can be calcu-
∗
i j
lated using an algorithm which handles general weights, due to Gabriel and
Zamir (1979). For a more general discussion of missing data in PCA see
Section 13.6.
5.4 Correspondence Analysis
The technique commonly called correspondence analysis has been ‘redis-
covered’ many times in several different guises with various names, such
as ‘reciprocal averaging’ or ‘dual scaling.’ Greenacre (1984) provides a
comprehensive treatment of the subject; in particular his Section 1.3 and
Chapter 4 discuss, respectively, the history and the various different ap-
proaches to the topic. Benz´ecri (1992) is also comprehensive, and more
recent, but its usefulness is limited by a complete lack of references to
other sources. Two shorter texts, which concentrate on the more practi-
cal aspects of correspondence analysis, are Clausen (1998) and Greenacre
(1993).
The name ‘correspondence analysis’ is derived from the French ‘analyse
des correspondances’ (Benz´ecri, 1980). Although, at first sight, correspon-
dence analysis seems unrelated to PCA it can be shown that it is, in fact,
equivalent to a form of PCA for discrete (generally nominal) variables (see
Section 13.1). The technique is often used to provide a graphical representa-
tion of data in two dimensions. The data are normally presented in the form
of a contingency table, but because of this graphical usage the technique is
introduced briefly in the present chapter. Further discussion of correspon-
dence analysis and various generalizations of the technique, together with
its connections to PCA, is given in Sections 13.1, 14.1 and 14.2.
Suppose that a set of data is presented in the form of a two-way contin-
gency table, in which a set of n observations is classified according to its
values on two discrete random variables. Thus the information available is
the set of frequencies {n ij ,i =1, 2,... ,r; j =1, 2,... ,c}, where n ij is the
number of observations that take the ith value for the first (row) variable

