Page 42 - Jolliffe I. Principal Component Analysis

P. 42

2.1. Optimal Algebraic Properties of Population Principal Components
Most of the properties described in this chapter have sample counter-
parts. Some have greater relevance in the sample context, but it is more
convenient to introduce them here, rather than in Chapter 3. 11
2.1 Optimal Algebraic Properties of Population
Principal Components and Their Statistical
Implications

Consider again the derivation of PCs given in Chapter 1, and denote by
z the vector whose kth element is z k ,the kth PC, k =1, 2,... ,p.(Unless
stated otherwise, the kth PC will be taken to mean the PC with the kth
largest variance, with corresponding interpretations for the ‘kth eigenvalue’
and ‘kth eigenvector.’) Then

z = A x, (2.1.1)
where A is the orthogonal matrix whose kth column, α k ,isthe kth
eigenvector of Σ. Thus, the PCs are deﬁned by an orthonormal linear
transformation of x. Furthermore, we have directly from the derivation
in Chapter 1 that

ΣA = AΛ, (2.1.2)
where Λ is the diagonal matrix whose kth diagonal element is λ k ,the kth
eigenvalue of Σ,and λ k =var(α x)=var(z k ). Two alternative ways of

k
expressing (2.1.2) that follow because A is orthogonal will be useful later,
namely

A ΣA = Λ (2.1.3)
and
Σ = AΛA . (2.1.4)

The orthonormal linear transformation of x, (2.1.1), which deﬁnes z,has a
number of optimal properties, which are now discussed in turn.
Property A1. For any integer q, 1 ≤ q ≤ p, consider the orthonormal
linear transformation

y = B x, (2.1.5)
where y is a q-element vector and B is a (q×p) matrix, and let Σ y = B ΣB

be the variance-covariance matrix for y. Then the trace of Σ y , denoted
tr (Σ y ), is maximized by taking B = A q ,where A q consists of the ﬁrst q
columns of A.

37 38 39 40 41 42 43 44 45 46 47