Page 61 - Jolliffe I. Principal Component Analysis
P. 61
3. Properties of Sample Principal Components
30
The seventh section then goes on to show how these distributions may be
used to make statistical inferences about the population PCs, based on
sample PCs.
Section 3.8 demonstrates how the approximate structure and variances
of PCs can sometimes be deduced from patterns in the covariance or cor-
relation matrix. Finally, in Section 3.9 we discuss models that have been
proposed for PCA. The material could equally well have been included in
Chapter 2, but because the idea of maximum likelihood estimation arises
in some of the models we include it in the present chapter.
3.1 Optimal Algebraic Properties of Sample
Principal Components
Before looking at the properties themselves, we need to establish some
notation. Suppose that we have n independent observations on the p-
element random vector x; denote these n observations by x 1 , x 2 ,..., x n .
Let ˜z i1 = a x i ,i =1, 2,... ,n, and choose the vector of coefficients a to
1 1
maximize the sample variance
1 2
n
(˜z i1 − ¯z 1 )
n − 1
i=1
subject to the normalization constraint a a 1 = 1. Next let ˜z i2 = a x i ,i =
1 2
1, 2,... ,n,and choose a to maximize the sample variance of the ˜z i2 subject
2
to the normalization constraint a a 2 = 1, and subject also to the ˜z i2 being
2
uncorrelated with the ˜z i1 in the sample. Continuing this process in an
obvious manner, we have a sample version of the definition of PCs given in
Section 1.1. Thus a x is defined as the kth sample PC, k =1, 2,... ,p,and
k
˜ z ik is the score for the ith observation on the kth PC. If the derivation in
Section 1.1 is followed through, but with sample variances and covariances
replacing population quantities, then it turns out that the sample variance
of the PC scores for the kth sample PC is l k ,the kth largest eigenvalue of the
sample covariance matrix S for x 1 , x 2 ,..., x n ,and a k is the corresponding
eigenvector for k =1, 2,...,p.
˜
˜
Define the (n × p) matrices X and Z to have (i, k)th elements equal to
˜
the value of the kth element ˜x ik of x i , and to ˜z ik , respectively. Then Z and
˜
˜
˜
X are related by Z = XA, where A is the (p×p) orthogonal matrix whose
kth column is a k .
X X.
If the mean of each element of x is known to be zero, then S = 1 ˜ ˜
n
It is far more usual for the mean of x to be unknown, and in this case the
(j, k)th element of S is
1
n
(˜x ij − ¯x j )(˜x ik − ¯x k ),
n − 1
i=1

