Page 48 - Jolliffe I. Principal Component Analysis
P. 48
2.1. Optimal Algebraic Properties of Population Principal Components
17
It follows that although Property A5 is stated as an algebraic property,
it can equally well be viewed geometrically. In fact, it is essentially the
population equivalent of sample Property G3, which is stated and proved
in Section 3.2. No proof of the population result A5 will be given here; Rao
(1973, p. 591) outlines a proof in which y is replaced by an equivalent set
of uncorrelated linear functions of x, and it is interesting to note that the
PCs are the only set of p linear functions of x that are uncorrelated and
have orthogonal vectors of coefficients. This last result is prominent in the
discussion of Chapter 11.
A special case of Property A5 was pointed out in Hotelling’s (1933)
original paper. He notes that the first PC derived from a correlation matrix
is the linear function of x that has greater mean square correlation with
the elements of x than does any other linear function. We return to this
interpretation of the property, and extend it, in Section 2.3.
A modification of Property A5 can be introduced by noting that if x is
predicted by a linear function of y = B x, then it follows from standard
results from multivariate regression (see, for example, Mardia et al., 1979,
p. 160), that the residual covariance matrix for the best such predictor is
Σ x − Σ xy Σ −1 Σ yx , (2.1.11)
y
where Σ x = Σ, Σ y = B ΣB, as defined before, Σ xy is the matrix whose
(j, k)th element is the covariance between the jth element of x and the
kth element of y,and Σ yx is the transpose of Σ xy .Now Σ yx = B Σ,and
Σ xy = ΣB, so (2.1.11) becomes
Σ − ΣB(B ΣB) −1 B Σ. (2.1.12)
2
The diagonal elements of (2.1.12) are σ ,j =1, 2,... ,p, so, from Property
j
A5, B = A q minimizes
p
2 −1
σ =tr[Σ − ΣB(B ΣB) B Σ].
j
j=1
A derivation of this result in the sample case, and further discussion of it,
is provided by Jong and Kotz (1999).
An alternative criterion is Σ − ΣB(B ΣB) −1 B Σ , where · denotes
the Euclidean norm of a matrix and equals the square root of the sum of
squares of all the elements in the matrix. It can be shown (Rao, 1964) that
this alternative criterion is also minimized when B = A q .
This section has dealt with PCs derived from covariance matrices. Many
of their properties are also relevant, in modified form, for PCs based on
correlation matrices, as discussed in Section 2.3. That section also contains
a further algebraic property which is specific to correlation matrix-based
PCA.

