Page 67 - Jolliffe I. Principal Component Analysis
P. 67
3. Properties of Sample Principal Components
36
n
=tr
i
i=1
n (x BB x i )
= tr(x BB x i )
i
i=1
n
= tr(B x i x B)
i
i=1
n
=tr B x i x B
i
i=1
=tr[B X XB]
=(n − 1) tr(B SB).
Finally, from Property A1, tr(B SB) is maximized when B = A q .
Instead of treating this property (G3) as just another property of sample
PCs, it can also be viewed as an alternative derivation of the PCs. Rather
than adapting for samples the algebraic definition of population PCs given
in Chapter 1, there is an alternative geometric definition of sample PCs.
They are defined as the linear functions (projections) of x 1 , x 2 ,..., x n that
successively define subspaces of dimension 1, 2,...,q,..., (p − 1) for which
the sum of squared perpendicular distances of x 1 , x 2 ,..., x n from the sub-
space is minimized. This definition provides another way in which PCs can
be interpreted as accounting for as much as possible of the total variation
in the data, within a lower-dimensional space. In fact, this is essentially
the approach adopted by Pearson (1901), although he concentrated on the
two special cases, where q =1 and q =(p − 1). Given a set of points in p-
dimensional space, Pearson found the ‘best-fitting line,’ and the ‘best-fitting
hyperplane,’ in the sense of minimizing the sum of squared deviations of
the points from the line or hyperplane. The best-fitting line determines the
first principal component, although Pearson did not use this terminology,
and the direction of the last PC is orthogonal to the best-fitting hyper-
plane. The scores for the last PC are simply the perpendicular distances of
the observations from this best-fitting hyperplane.
Property G4. Let X be the (n × p) matrix whose (i, j)th element is
˜ x ij − ¯x j , and consider the matrix XX . The ith diagonal element of XX
2
is p (˜x ij − ¯x j ) , which is the squared Euclidean distance of x i from the
j=1
centre of gravity ¯ x of the points x 1 , x 2 ,..., x n ,where ¯ x = 1 n x i .Also,
i=1
n
the (h, i)th element of XX is p j=1 (˜x hj − ¯x j )(˜x ij − ¯x j ), which measures
the cosine of the angle between the lines joining x h and x i to ¯ x, multiplied
by the distances of x h and x i from ¯ x. Thus XX contains information
about the configuration of x 1 , x 2 ,..., x n relative to ¯ x. Now suppose that
x 1 , x 2 ,..., x n are projected onto a q-dimensional subspace with the usual
orthogonal transformation y i = B x i ,i =1, 2,... ,n. Then the transfor-

