Page 169 - Jolliffe I. Principal Component Analysis
P. 169
138
6. Choosing a Subset of Principal Components or Variables
methods, including some based on cluster analyses of variables (see Sec-
tion 9.2) were also examined but, as these do not use the PCs to select
variables, they are not described here. Three main types of method using
PCs were examined.
∗
(i) Associate one variable with each of the last m (= p − m 1 ) PCs and
1
delete those m variables. This can either be done once only or iter-
∗
1
atively. In the latter case a second PCA is performed on the m 1
remaining variables, and a further set of m variables is deleted, if ap-
∗
2
propriate. A third PCA can then be done on the p−m −m variables,
∗
∗
1 2
and the procedure is repeated until no further deletions are considered
necessary. The choice of m ,m ,... is based on a criterion determined
∗
∗
1 2
by the size of the eigenvalues l k .
The reasoning behind this method is that small eigenvalues correspond
to near-constant relationships among a subset of variables. If one of
the variables involved in such a relationship is deleted (a fairly obvious
choice for deletion is the variable with the highest coefficient in abso-
lute value in the relevant PC) little information is lost. To decide on
how many variables to delete, the criterion l k is used as described in
Section 6.1.2. The criterion t m of Section 6.1.1 was also tried by Jolliffe
(1972), but shown to be less useful.
∗
(ii) Associate a set of m variables en bloc with the last m PCs, and
∗
then delete these variables. Jolliffe (1970, 1972) investigated this type
of method, with the m variables either chosen to maximize sums of
∗
∗
squares of coefficients in the last m PCs or to be those m variables
∗
∗
that are best predicted by regression on the first m = p − m PCs.
∗
Choice of m is again based on the sizes of the l k . Such methods
were found to be unsatisfactory, as they consistently failed to select
an appropriate subset for some simple correlation structures.
(iii) Associate one variable with each of the first m PCs, namely the variable
not already chosen with the highest coefficient in absolute value in
each successive PC. These m variables are retained, and the remaining
m = p − m are deleted. The arguments leading to this approach are
∗
twofold. First, it is an obvious complementary approach to (i) and,
second, in cases where there are groups of highly correlated variables it
is designed to select just one variable from each group. This will happen
because there will be exactly one high-variance PC associated with each
group (see Section 3.8). The approach is a plausible one, as a single
variable from each group should preserve most of the information given
by that group when all variables in the group are highly correlated.
In Jolliffe (1972) comparisons were made, using simulated data, between
non-iterative versions of method (i) and method (iii), called methods B2, B4
respectively, and with several other subset selection methods that did not
use the PCs. The results showed that the PC methods B2, B4 retained the

