Page 64 - Jolliffe I. Principal Component Analysis
P. 64
33
3.2. Geometric Properties of Sample Principal Components
the last q columns of a matrix whose kth column is the kth eigenvector
−1
−1
has the same eigenvectors as X X, ex-
of (X X)
. Furthermore, (X X)
cept that their order is reversed, so that B q must have columns equal to
the first q eigenvectors of X X. As this holds for q =1, 2,... ,p, Property
A7 is proved.
This property seems to imply that replacing the predictor variables in a
regression analysis by their first few PCs is an attractive idea, as those PCs
omitted have coefficients that are estimated with little precision. The flaw in
this argument is that nothing in Property A7 takes account of the strength
of the relationship between the dependent variable y and the elements of
x,orbetween y and the PCs. A large variance for ˆγ k ,the kth element of
γ, and hence an imprecise estimate of the degree of relationship between y
and the kth PC, z k , does not preclude a strong relationship between y and
z k (see Section 8.2). Further discussion of Property A7 is given by Fomby
et al. (1978).
There are a number of other properties of PCs specific to the sample
situation; most have geometric interpretations and are therefore dealt with
in the next section.
3.2 Geometric Properties of Sample Principal
Components
As with the algebraic properties, the geometric properties of Chapter 2
are also relevant for sample PCs, although with slight modifications to the
statistical implications. In addition to these properties, the present section
includes a proof of a sample version of Property A5, viewed geometrically,
and introduces two extra properties which are relevant to sample, but not
population, PCs.
Property G1 is still valid for samples if Σ is replaced by S. The ellipsoids
−1
x S x = const no longer have the interpretation of being contours of
constant probability, though they will provide estimates of such contours
if x 1 , x 2 ,..., x n are drawn from a multivariate normal distribution. Re-
introducing a non-zero mean, the ellipsoids
−1
(x − ¯ x) S (x − ¯ x) = const
give contours of equal Mahalanobis distance from the sample mean ¯ x.
Flury and Riedwyl (1988, Section 10.6) interpret PCA as successively find-
ing orthogonal directions for which the Mahalanobis distance from the
data set to a hypersphere enclosing all the data is minimized (see Sec-
tions 5.3, 9.1 and 10.1 for discussion of Mahalanobis distance in a variety
of forms).

