Page 263 - Jolliffe I. Principal Component Analysis
P. 263

9. Principal Components Used with Other Multivariate Techniques
                              230
                              spanned by the first m terms in the SVD. This is equivalent to projecting
                                        ˆ
                                                                                      ˆ
                              the rows of Y onto the subspace spanned by the first m PCs of Y.
                                Two further equivalences are noted by ter Braak and Looman (1994),
                              namely that the reduced rank regression model estimated in this way is
                              equivalent to redundancy analysis, and also to PCA of instrumental vari-
                              ables, as introduced by Rao (1964) (see Section 14.3). Van den Brink and
                              ter Braak (1999) also refer to redundancy analysis as ‘PCA in which sample
                              scores are constrained to be linear combinations of the explanatory [pre-
                              dictor] variables.’ They extend redundancy analysis to the case where the
                              variables in X and Y are observed over several time periods and the model
                              changes with time. This extension is discussed further in Section 12.4.2.
                              Because of the link with PCA, it is possible to construct biplots (see Sec-
                              tion 5.3) of the regression coefficients in the reduced rank regression model
                              (ter Braak and Looman, 1994).
                                Aldrin (2000) proposes a modification of reduced rank regression, called
                              softly shrunk reduced-rank regression (SSRRR), in which the terms in the
                                     ˆ
                              SVD of Y are given varying non-zero weights, rather than the all-or-nothing
                              inclusion/exclusion of terms in reduced rank regression. Aldrin (2000) also
                              suggests that a subset of PCs of the predictor variables may be used as
                              input for a reduced rank regression or SSRRR instead of the predictor
                              variables themselves. In a simulation study comparing least squares with a
                              number of biased multivariate regression procedures, SSRRR with PCs as
                              input seems to be the best method overall.
                                Reduced rank regression models essentially assume a latent structure
                              underlying the predictor variables, so that their dimensionality can be re-
                              duced below p 2 . Burnham et al. (1999) describe so-called latent variable
                              multivariate regression models, which take the idea of reduced rank regres-
                              sion further by postulating overlapping latent structures underlying both
                              the response and predictor variables. The model can be written

                                                      X = Z X Γ X + E X
                                                      Y = Z Y Γ Y + E Y ,

                              where Z X , Z Y are of dimension (n × m) and contain values of m latent
                              variables for the n observations; Γ X , Γ Y are (m × p 1 ), (m × p 2 ) matrices
                              of unknown parameters, and E X , E Y are matrices of errors.
                                To fit this model, Burnham et al. (1999) suggest carrying out PCAs
                              on the data in X, on that in Y, and on the combined (n × (p 1 + p 2 ))
                              matrix containing both response and predictor variables. In each PCA, a
                              judgment is made of how many PCs seem to represent common underlying
                              structure and how many represent error or noise. Suppose that the numbers
                              of non-noisy PCs in the three analyses are m X , m Y and m C , with obvious
                              notation. The implication is then that the overlapping part of the latent
                              structures has dimension m X + m Y − m C .If m X = m Y = m C there is
                              complete overlap, whereas if m C = m X + m Y there is none. This model
   258   259   260   261   262   263   264   265   266   267   268