Page 155 - Jolliffe I. Principal Component Analysis
P. 155
6. Choosing a Subset of Principal Components or Variables
124
for large n, PRESS(m)and W are almost equivalent to the much simpler
k=m+1 k and
l
quantities
p
,
l m
l
p
k=m+1 k
respectively. However, Gabriel (personal communication) notes that this
conclusion holds only for large sample sizes.
In Section 3.9 we introduced the fixed effects model. A number of authors
have used this model as a basis for constructing rules to determine m,
with some of the rules relying on the resampling ideas associated with the
bootstrap and jackknife. Recall that the model assumes that the rows x i of
the data matrix are such that E(x i )= z i , where z i lies in a q-dimensional
2
space F q .If e i is defined as (x i − z i ), then E(e i )= 0 and var(e i )= σ Γ,
w i
where Γ is a positive definite symmetric matrix and the w i are positive
scalars whose sum is unity. For fixed q, the quantity
n
2
w i x i − z i , (6.1.6)
M
i=1
2
given in equation (3.9.1), is to be minimized in order to estimate σ ,the z i
and F q (Γ and the w i are assumed known). The current selection problem
is not only to estimate the unknown parameters, but also to find q.We
wish our choice of m, the number of components retained, to coincide with
the true value of q, assuming that such a value exists.
To choose m,Ferr´e (1990) attempts to find q so that it minimizes the
loss function
n
2
f q = E[ w i z i − ˆ z i −1], (6.1.7)
Γ
i=1
where ˆ z i is the projection of x i onto F q . The criterion f q cannot be calcu-
lated, but must be estimated, and Ferr´e (1990) shows that a good estimate
of f q is
ˆ
p q p
ˆ
ˆ
2
f q = λ k + σ [2q(n + q − p) − np +2(p − q)+4 λ l ],
ˆ
ˆ
(λ l − λ k )
k=q+1 l=1 k=q+1
(6.1.8)
ˆ
where λ k is the kth largest eigenvalue of VΓ −1 and
p
V = w i (x i − ¯ x)(x i − ¯ x) .
i=1
In the special case where Γ = I p and w i = 1 , i =1,...,n,wehave
n
−1 (n−1) ˆ (n−1)
VΓ = S,and λ k = l k , where l k is the kth largest eigenvalue
n n
of the sample covariance matrix S. In addition, ˆ z i is the projection of x i
2
onto the space spanned by the first q PCs. The residual variance σ still

