Page 133 - Jolliffe I. Principal Component Analysis
P. 133

5. Graphical Representation of Data Using Principal Components
                              102
                              of X X, and hence to PCA. The variations discussed so far relate only to

                              choices within this classical plot, for example the choice of α in defining g i
                              and h j (5.3.2), the possible rescaling by a factor (n − 1) 1/2  and the form
                              of display (axes or arrowheads, concentration ellipses).
                                Gower and Hand (1996) describe many other variations. In particular,
                              they look at biplots related to multivariate techniques other than PCA,
                              including multidimensional scaling, canonical variate analysis, correspon-
                              dence analysis and multiple correspondence analysis. Gabriel (1995a,b) also
                              discusses biplots related to multivariate methods other than PCA, in partic-
                              ular multiple correspondence analysis and MANOVA (multivariate analysis
                              of variance).
                                A key distinction drawn by Gower and Hand (1996) is between interpo-
                              lation and prediction in a biplot. The former is concerned with determining
                              where in the diagram to place an observation, given its values on the mea-
                              sured variables. Prediction refers to estimating the values of these variables,
                              given the position of an observation in the plot. Both are straightforward
                                                  ∗
                              for classical biplots—g is used for interpolation and 2 ˜x ij for prediction—
                                                 i
                              but become more complicated for other varieties of biplot. Gower and Hand
                              (1996, Chapter 7) describe a framework for generalized biplots that includes
                              most other versions as special cases. One important special case is that of
                              non-linear biplots. These will be discussed further in Section 14.1, which
                              describes a number of non-linear modifications of PCA. Similarly, discus-
                              sion of robust biplots, due to Daigle and Rivest (1992), will be deferred
                              until Section 10.4, which covers robust versions of PCA.
                                The discussion and examples of the classical biplot given above use an
                              unstandardized form of X and hence are related to covariance matrix PCA.
                              As noted in Section 2.3 and elsewhere, it is more usual, and often more
                              appropriate, to base PCA on the correlation matrix as in the examples
                              of Section 5.3.1. Corresponding biplots can be derived from the SVD of
                              ˜
                              X, the column-centred data matrix whose jth column has been scaled by
                              dividing by the standard deviation of x j , j =1, 2,... ,p. Many aspects of
                              the biplot remain the same when the correlation, rather than covariance,
                              matrix is used. The main difference is in the positions of the h j . Recall that
                              if α = 0 is chosen, together with the scaling factor (n−1) 1/2 , then the length
                               ∗
                              h h approximates the variance of x j . In the case of a correlation-based
                                  ∗
                                j
                                  j
                              analysis, var(x j ) = 1 and the quality of the biplot approximation to the
                              jth variable by the point representing h can be judged by the closeness of
                                                                ∗
                                                                j
                              h to the unit circle centred at the origin. For this reason, the unit circle is
                               ∗
                               j
                              sometimes drawn on correlation biplots to assist in evaluating the quality of
                              the approximation (Besse, 1994a). Another property of correlation biplots
                              is that the squared distance between h j and h k is 2(1 − r jk ), where r jk is
                              the correlation between x j and x k . The squared distance between h and
                                                                                         ∗
                                                                                         j
                               ∗
                              h approximates this quantity.
                               k
                                An alternative to the covariance and correlation biplots is the coefficient
                              of variation biplot, due to Underhill (1990). As its name suggests, instead
   128   129   130   131   132   133   134   135   136   137   138