Page 37 - Jolliffe I. Principal Component Analysis
P. 37
1. Introduction
6
could be used to specify zero correlation between α x and α x. Choosing
1
the last of these (an arbitrary choice), and noting that a normalization
constraint is again necessary, the quantity to be maximized is 2
α Σα 2 − λ(α α 2 − 1) − φα α 1 ,
2
2
2
where λ, φ are Lagrange multipliers. Differentiation with respect to α 2
gives
Σα 2 − λα 2 − φα 1 = 0
and multiplication of this equation on the left by α gives
1
α Σα 2 − λα α 2 − φα α 1 =0,
1
1
1
which, since the first two terms are zero and α α 1 = 1, reduces to φ =0.
1
Therefore, Σα 2 − λα 2 = 0, or equivalently (Σ − λI p )α 2 = 0,so λ is once
more an eigenvalue of Σ,and α 2 the corresponding eigenvector.
Again, λ = α Σα 2 ,so λ is to be as large as possible. Assuming that
2
Σ does not have repeated eigenvalues, a complication that is discussed in
Section 2.4, λ cannot equal λ 1 . If it did, it follows that α 2 = α 1 , violating
the constraint α α 2 = 0. Hence λ is the second largest eigenvalue of Σ,
1
and α 2 is the corresponding eigenvector.
As stated above, it can be shown that for the third, fourth, ..., pth
PCs, the vectors of coefficients α 3 , α 4 ,..., α p are the eigenvectors of Σ
corresponding to λ 3 ,λ 4 ,...,λ p , the third and fourth largest, ..., and the
smallest eigenvalue, respectively. Furthermore,
for k =1, 2,... ,p.
var[α x]= λ k
k
This derivation of the PC coefficients and variances as eigenvectors and
eigenvalues of a covariance matrix is standard, but Flury (1988, Section 2.2)
and Diamantaras and Kung (1996, Chapter 3) give alternative derivations
that do not involve differentiation.
It should be noted that sometimes the vectors α k are referred to
as ‘principal components.’ This usage, though sometimes defended (see
Dawkins (1990), Kuhfeld (1990) for some discussion), is confusing. It is
preferable to reserve the term ‘principal components’ for the derived vari-
ables α x, and refer to α k as the vector of coefficients or loadings for the
k
kth PC. Some authors distinguish between the terms ‘loadings’ and ‘coef-
ficients,’ depending on the normalization constraint used, but they will be
used interchangeably in this book.
1.2 A Brief History of Principal Component
Analysis
The origins of statistical techniques are often difficult to trace. Preisendor-
fer and Mobley (1988) note that Beltrami (1873) and Jordan (1874)

