Page 134 - Jolliffe I. Principal Component Analysis
P. 134

103
                                                                5.4. Correspondence Analysis
                              of dividing the jth column of X by the standard deviation of x j to give a
                              correlation biplot, here the jth column is divided by the mean of x j .Of
                              course, this only makes sense for certain types of non-negative variables,
                              but Underhill (1990) shows that for such variables the resulting biplot gives
                              a useful view of the data and variables. The cosines of the angles between
                                   ∗
                              the h still provide approximations to the correlations between variables,
                                   j
                                                          ∗
                              but the lengths of the vectors h now give information on the variability
                                                          j
                              of the x j relative to their means.
                                Finally, the biplot can be adapted to cope with missing values by in-
                              troducing weights w ij for each observation x ij when approximating x ij by

                              g h . A weight of zero is given to missing values and a unit weight to those
                                  ∗
                               ∗
                               i  j
                                                                                 ∗
                              values which are present. The appropriate values for g , h can be calcu-
                                                                              ∗
                                                                              i  j
                              lated using an algorithm which handles general weights, due to Gabriel and
                              Zamir (1979). For a more general discussion of missing data in PCA see
                              Section 13.6.
                              5.4 Correspondence Analysis
                              The technique commonly called correspondence analysis has been ‘redis-
                              covered’ many times in several different guises with various names, such
                              as ‘reciprocal averaging’ or ‘dual scaling.’ Greenacre (1984) provides a
                              comprehensive treatment of the subject; in particular his Section 1.3 and
                              Chapter 4 discuss, respectively, the history and the various different ap-
                              proaches to the topic. Benz´ecri (1992) is also comprehensive, and more
                              recent, but its usefulness is limited by a complete lack of references to
                              other sources. Two shorter texts, which concentrate on the more practi-
                              cal aspects of correspondence analysis, are Clausen (1998) and Greenacre
                              (1993).
                                The name ‘correspondence analysis’ is derived from the French ‘analyse
                              des correspondances’ (Benz´ecri, 1980). Although, at first sight, correspon-
                              dence analysis seems unrelated to PCA it can be shown that it is, in fact,
                              equivalent to a form of PCA for discrete (generally nominal) variables (see
                              Section 13.1). The technique is often used to provide a graphical representa-
                              tion of data in two dimensions. The data are normally presented in the form
                              of a contingency table, but because of this graphical usage the technique is
                              introduced briefly in the present chapter. Further discussion of correspon-
                              dence analysis and various generalizations of the technique, together with
                              its connections to PCA, is given in Sections 13.1, 14.1 and 14.2.
                                Suppose that a set of data is presented in the form of a two-way contin-
                              gency table, in which a set of n observations is classified according to its
                              values on two discrete random variables. Thus the information available is
                              the set of frequencies {n ij ,i =1, 2,... ,r; j =1, 2,... ,c}, where n ij is the
                              number of observations that take the ith value for the first (row) variable
   129   130   131   132   133   134   135   136   137   138   139