Page 36 - Jolliffe I. Principal Component Analysis
P. 36
1.1. Definition and Derivation of Principal Components
much of Chapters 2 and 3 and concentrate their attention on later chapters,
although Sections 2.3, 2.4, 3.3, 3.4, 3.8, and to a lesser extent 3.5, are likely
to be of interest to most readers. 5
To derive the form of the PCs, consider first α x; the vector α 1 max-
1
imizes var[α x]= α Σα 1 . It is clear that, as it stands, the maximum
1 1
will not be achieved for finite α 1 so a normalization constraint must be
imposed. The constraint used in the derivation is α α 1 = 1,thatis, the
1
sum of squares of elements of α 1 equals 1. Other constraints, for example
Max j |α 1j | = 1, may more useful in other circumstances, and can easily be
substituted later on. However, the use of constraints other than α α 1 =
1
constant in the derivation leads to a more difficult optimization problem,
and it will produce a set of derived variables different from the PCs.
To maximize α Σα 1 subject to α α 1 = 1, the standard approach is to
1
1
use the technique of Lagrange multipliers. Maximize
α Σα 1 − λ(α α 1 − 1),
1
1
where λ is a Lagrange multiplier. Differentiation with respect to α 1 gives
Σα 1 − λα 1 = 0,
or
(Σ − λI p )α 1 = 0,
where I p is the (p × p) identity matrix. Thus, λ is an eigenvalue of Σ and
α 1 is the corresponding eigenvector. To decide which of the p eigenvectors
gives α x with maximum variance, note that the quantity to be maximized
1
is
α Σα 1 = α λα 1 = λα α 1 = λ,
1
1
1
so λ must be as large as possible. Thus, α 1 is the eigenvector corresponding
to the largest eigenvalue of Σ,andvar(α x)= α Σα 1 = λ 1 , the largest
1
1
eigenvalue.
In general, the kth PC of x is α x and var(α x)= λ k , where λ k is
k k
the kth largest eigenvalue of Σ,and α k is the corresponding eigenvector.
This will now be proved for k = 2; the proof for k ≥ 3 is slightly more
complicated, but very similar.
The second PC, α x, maximizes α Σα 2 subject to being uncorrelated
2 2
with α x, or equivalently subject to cov[α x, α x] = 0, where cov(x, y)
1 1 2
denotes the covariance between the random variables x and y . But
cov [α x, α x]= α Σα 2 = α Σα 1 = α λ 1 α = λ 1 α α 1 = λ 1 α α 2 .
2
1
1
2
2
1
1
2
Thus, any of the equations
α Σα 2 =0, α Σα 1 =0,
2
1
α α 2 =0, α α 1 =0
2
1

