Page 63 - Jolliffe I. Principal Component Analysis
P. 63
3. Properties of Sample Principal Components
32
In the case of sample correlation matrices, one further reason can be put
forward for interest in the last few PCs, as found by Property A2. Raveh
(1985) argues that the inverse R −1 of a correlation matrix is of greater
interest in some situations than R. It may then be more important to
approximate R −1 than R in a few dimensions. If this is done using the
spectral decomposition (Property A3) of R −1 , then the first few terms will
correspond to the last few PCs, since eigenvectors of R and R −1 are the
same, except that their order is reversed. The rˆole of the last few PCs will
be discussed further in Sections 3.4 and 3.7, and again in Sections 6.3, 8.4,
8.6 and 10.1.
One further property, which is concerned with the use of principal com-
ponents in regression, will now be discussed. Standard terminology from
regression is used and will not be explained in detail (see, for example,
Draper and Smith (1998)). An extensive discussion of the use of principal
components in regression is given in Chapter 8.
Property A7. Suppose now that X, defined as above, consists of n ob-
servations on p predictor variables x measured about their sample means,
and that the corresponding regression equation is
y = Xβ + , (3.1.5)
where y is the vector of n observations on the dependent variable, again
measured about the sample mean. (The notation y for the dependent vari-
able has no connection with the usage of y elsewhere in the chapter, but
is standard in regression.) Suppose that X is transformed by the equation
Z = XB,where B is a (p × p) orthogonal matrix. The regression equation
can then be rewritten as
y = Zγ + ,
where γ = B −1 β. The usual least squares estimator for γ is ˆ γ =
(Z Z) −1 Z y. Then the elements of ˆ γ have, successively, the smallest possi-
ble variances if B = A, the matrix whose kth column is the kth eigenvector
of X X, and hence the kth eigenvector of S.Thus Z consists of values of
the sample principal components for x.
Proof. From standard results in regression (Draper and Smith, 1998,
Section 5.2) the covariance matrix of the least squares estimator ˆ γ is
proportional to
(Z Z) −1 =(B X XB) −1
−1
= B −1 (X X) −1 (B )
= B (X X) −1 B,
as B is orthogonal. We require tr(B (X X) −1 B q ), q =1, 2,... ,p be min-
q
imized, where B q consists of the first q columns of B. But, replacing Σ y
by (X X) −1 in Property A2 of Section 2.1 shows that B q must consist of

