Page 448 - Jolliffe I. Principal Component Analysis
P. 448
413
A.1. Numerical Calculation of Principal Components
observation on each PC, and hence all the information that is required
to construct a biplot (see Section 5.3). The PC scores would otherwise
need to be derived as an extra step after calculating the eigenvalues and
eigenvectors of the covariance or correlation matrix S = 1 X X.
n−1
The values of the PC scores are related to the eigenvectors of XX ,which
can be derived from the eigenvectors of X X (see the proof of Property G4
in Section 3.2); conversely, the eigenvectors of X X can be found from
those of XX . In circumstances where the sample size n is smaller than the
number of variables p, XX has smaller dimensions than X X,sothatit
can be advantageous to use the algorithms described above, based on the
power method or QL method, on a multiple of XX rather than X X in such
cases. Large computational savings are possible when n p, as in chemical
spectroscopy or in the genetic example of Hastie et al. (2000), which is
described in Section 9.2 and which has n = 48, p = 4673. Algorithms also
exist for updating the SVD if data arrive sequentially (see for example
Berry et al. (1995)).
Neural Network Algorithms
Neural networks provide ways of extending PCA, including some non-linear
generalizations (see Sections 14.1.3, 14.6.1). They also give alternative al-
gorithms for estimating ‘ordinary’ PCs. The main difference between these
algorithms and the techniques described earlier in the Appendix is that
most are ‘adaptive’ rather than ‘batch’ methods. If the whole of a data
set is collected before PCA is done and parallel processing is not possible,
then batch methods such as the QR algorithm are hard to beat (see Dia-
mantaras and Kung, 1996 (hereafter DK96), Sections 3.5.3, 4.4.1). On the
other hand, if data arrive sequentially and PCs are re-estimated when new
data become available, then adaptive neural network algorithms come into
their own. DK96, Section 4.2.7 note that ‘there is a plethora of alternative
[neural network] techniques that perform PCA.’ They describe a selection
of single-layer techniques in their Section 4.2, with an overview of these in
their Table 4.1. Different algorithms arise depending on
• whether the first or last few PCs are of interest;
• whether one or more than one PC is required;
• whether individual PCs are wanted or whether subspaces spanned by
several PCs will suffice;
• whether the network is required to be biologically plausible.
DK96, Section 4.2.7 treat finding the last few PCs as a different technique,
calling it minor component analysis.
In their Section 4.4, DK96 compare the properties, including speed, of
seven algorithms using simulated data. In Section 4.5 they discuss multi-
layer networks.

