Page 46 - Jolliffe I. Principal Component Analysis
P. 46
2.1. Optimal Algebraic Properties of Population Principal Components
and, from (2.1.10),
p
Σ − Σ xz Σ −1 Σ zx = λ k α k α . 15
zz k
k=(q+1)
Finding a linear function of x having maximum conditional variance
reduces to finding the eigenvalues and eigenvectors of the conditional co-
variance matrix, and it easy to verify that these are simply (λ (q+1) , α (q+1) ),
(λ (q+2) , α (q+2) ),..., (λ p , α p ). The eigenvector associated with the largest
of these eigenvalues is α (q+1) , so the required linear function is α x,
(q+1)
namely the (q + 1)th PC.
Property A4. As in Properties A1, A2, consider the transformation
y = B x.If det(Σ y ) denotes the determinant of the covariance matrix y,
then det(Σ y ) is maximized when B = A q .
Proof. Consider any integer, k, between 1 and q, and let S k =
the subspace of p-dimensional vectors orthogonal to α 1 ,..., α k−1 . Then
dim(S k )= p − k + 1, where dim(S k ) denotes the dimension of S k .The kth
eigenvalue, λ k ,of Σ satisfies
α Σα
λ k = Sup .
α α
α∈S k
α =0
Suppose that µ 1 >µ 2 > ··· >µ q , are the eigenvalues of B ΣB and that
γ , γ , ··· , γ , are the corresponding eigenvectors. Let T k = the subspace
2
1
q
of q-dimensional vectors orthogonal to γ k+1 , ··· , γ , with dim(T k )= k.
q
Then, for any non-zero vector γ in T k ,
γ B ΣBγ
≥ µ k .
γ γ
˜
Consider the subspace S k of p-dimensional vectors of the form Bγ for γ in
T k .
˜
dim(S k )=dim(T k )= k (because B is one-to-one; in fact,
B preserves lengths of vectors).
From a general result concerning dimensions of two vector spaces, we have
˜
˜
˜
dim(S k ∩ S k )+dim(S k + S k )=dim S k +dim S k .
But
˜
˜
dim(S k + S k ) ≤ p, dim(S k )= p − k + 1 and dim(S k )= k,
so
˜
dim(S k ∩ S k ) ≥ 1.

