Page 263 - Jolliffe I. Principal Component Analysis

P. 263

9. Principal Components Used with Other Multivariate Techniques
230
spanned by the first m terms in the SVD. This is equivalent to projecting
ˆ
ˆ
the rows of Y onto the subspace spanned by the first m PCs of Y.
Two further equivalences are noted by ter Braak and Looman (1994),
namely that the reduced rank regression model estimated in this way is
equivalent to redundancy analysis, and also to PCA of instrumental vari-
ables, as introduced by Rao (1964) (see Section 14.3). Van den Brink and
ter Braak (1999) also refer to redundancy analysis as ‘PCA in which sample
scores are constrained to be linear combinations of the explanatory [pre-
dictor] variables.’ They extend redundancy analysis to the case where the
variables in X and Y are observed over several time periods and the model
changes with time. This extension is discussed further in Section 12.4.2.
Because of the link with PCA, it is possible to construct biplots (see Sec-
tion 5.3) of the regression coefficients in the reduced rank regression model
(ter Braak and Looman, 1994).
Aldrin (2000) proposes a modification of reduced rank regression, called
softly shrunk reduced-rank regression (SSRRR), in which the terms in the
ˆ
SVD of Y are given varying non-zero weights, rather than the all-or-nothing
inclusion/exclusion of terms in reduced rank regression. Aldrin (2000) also
suggests that a subset of PCs of the predictor variables may be used as
input for a reduced rank regression or SSRRR instead of the predictor
variables themselves. In a simulation study comparing least squares with a
number of biased multivariate regression procedures, SSRRR with PCs as
input seems to be the best method overall.
Reduced rank regression models essentially assume a latent structure
underlying the predictor variables, so that their dimensionality can be re-
duced below p 2 . Burnham et al. (1999) describe so-called latent variable
multivariate regression models, which take the idea of reduced rank regres-
sion further by postulating overlapping latent structures underlying both
the response and predictor variables. The model can be written

X = Z X Γ X + E X
Y = Z Y Γ Y + E Y ,

where Z X , Z Y are of dimension (n × m) and contain values of m latent
variables for the n observations; Γ X , Γ Y are (m × p 1 ), (m × p 2 ) matrices
of unknown parameters, and E X , E Y are matrices of errors.
To ﬁt this model, Burnham et al. (1999) suggest carrying out PCAs
on the data in X, on that in Y, and on the combined (n × (p 1 + p 2 ))
matrix containing both response and predictor variables. In each PCA, a
judgment is made of how many PCs seem to represent common underlying
structure and how many represent error or noise. Suppose that the numbers
of non-noisy PCs in the three analyses are m X , m Y and m C , with obvious
notation. The implication is then that the overlapping part of the latent
structures has dimension m X + m Y − m C .If m X = m Y = m C there is
complete overlap, whereas if m C = m X + m Y there is none. This model

258 259 260 261 262 263 264 265 266 267 268