Page 438 - Jolliffe I. Principal Component Analysis
P. 438
14.6. Miscellanea
403
14.6.3 Regression Components, Sweep-out Components and
Extended Components
Ottestad (1975) proposed an alternative to PCA that he called regression
components. He developed this for standardized variables, and hence it is
a correlation-based analysis. The new variables, or regression components,
y 1 ,y 2 ,...,y p are defined in terms of the original (standardized) variables
x 1 ,x 2 ,...,x p as y 1 = x 1 , y 2 = x 2 − b 21 x 1 , y 3 = x 3 − b 31 x 1 − b 32 x 2 , ...,
y p = x p − b p1 x 1 − b p2 x 2 − ...b p(p−1) x (p−1) ,
where b jk is the regression coefficient of x k in a regression of x j on all other
variables on the right hand-side of the equation defining y j . It should be
stressed that the labelling in these defining equations has been chosen for
simplicity to correspond to the order in which the y variables are defined. It
will usually be different from the labelling of the data as originally recorded.
The x variables can be selected in any order to define the y variables and
the objective of the technique is to choose a best order from the p!pos-
sibilities. This is done by starting with y p , for which x p is chosen to be
the original variable that has maximum multiple correlation with the other
(p − 1) variables. The next variable x (p−1) ,fromwhich y (p−1) is defined,
2
2
2
minimizes (1 + b p(p−1) ) (1 − R ), where R denotes the multiple correla-
tion of x (p−1) with x (p−2) ,x (p−3) ,...,x 1 , and so on until only x 1 is left.
The reasoning behind the method, which gives uncorrelated components,
is that it provides results that are simpler to interpret than PCA in the
examples that Ottestad (1975) studies. However, orthogonality of vectors
of coefficients and successive variance maximization are both lost. Unlike
the techniques described in Chapter 11, no explicit form of simplicity is
targeted and neither is there any overt attempt to limit variance loss, so
the method is quite different from PCA.
A variation on the same theme is proposed by Atiqullah and Uddin
(1993). They also produce new variables y 1 ,y 2 ,...,y p from a set of mea-
sured variables x 1 ,x 2 ,...,x p in a sequence y 1 = x 1 , y 2 = x 2 − b 21 x 1 ,
y 3 = x 3 − b 31 x 1 − b 32 x 2 , ...,
y p = x p − b p1 x 1 − b p2 x 2 − ...b p(p−1) x (p−1) ,
but for a different set of b kj . Although the details are not entirely clear it
appears that, unlike Ottestad’s (1975) method, the ordering in the sequence
is not determined by statistical criteria, but simply corresponds to the
labels on the original x variables. Atiqullah and Uddin (1993) transform
the covariance matrix for the x variables into upper triangular form, with
diagonal elements equal to unity. The elements of this matrix above the
diagonal are then the b kj . As with Ottestad’s method, the new variables,
called sweep-out components, are uncorrelated.
Rather than compare variances of y 1 ,y 2 ,...,y p , which do not sum to
p var(x i ), both Ottestad (1975) and Atiqullah and Uddin (1993) de-
j=1

