Page 448 - Jolliffe I. Principal Component Analysis
P. 448

413
                                             A.1. Numerical Calculation of Principal Components
                              observation on each PC, and hence all the information that is required
                              to construct a biplot (see Section 5.3). The PC scores would otherwise
                              need to be derived as an extra step after calculating the eigenvalues and
                              eigenvectors of the covariance or correlation matrix S =  1  X X.

                                                                               n−1
                                The values of the PC scores are related to the eigenvectors of XX ,which

                              can be derived from the eigenvectors of X X (see the proof of Property G4


                              in Section 3.2); conversely, the eigenvectors of X X can be found from

                              those of XX . In circumstances where the sample size n is smaller than the

                              number of variables p, XX has smaller dimensions than X X,sothatit

                              can be advantageous to use the algorithms described above, based on the

                              power method or QL method, on a multiple of XX rather than X X in such

                              cases. Large computational savings are possible when n   p, as in chemical
                              spectroscopy or in the genetic example of Hastie et al. (2000), which is
                              described in Section 9.2 and which has n = 48, p = 4673. Algorithms also
                              exist for updating the SVD if data arrive sequentially (see for example
                              Berry et al. (1995)).
                              Neural Network Algorithms
                              Neural networks provide ways of extending PCA, including some non-linear
                              generalizations (see Sections 14.1.3, 14.6.1). They also give alternative al-
                              gorithms for estimating ‘ordinary’ PCs. The main difference between these
                              algorithms and the techniques described earlier in the Appendix is that
                              most are ‘adaptive’ rather than ‘batch’ methods. If the whole of a data
                              set is collected before PCA is done and parallel processing is not possible,
                              then batch methods such as the QR algorithm are hard to beat (see Dia-
                              mantaras and Kung, 1996 (hereafter DK96), Sections 3.5.3, 4.4.1). On the
                              other hand, if data arrive sequentially and PCs are re-estimated when new
                              data become available, then adaptive neural network algorithms come into
                              their own. DK96, Section 4.2.7 note that ‘there is a plethora of alternative
                              [neural network] techniques that perform PCA.’ They describe a selection
                              of single-layer techniques in their Section 4.2, with an overview of these in
                              their Table 4.1. Different algorithms arise depending on

                                 • whether the first or last few PCs are of interest;
                                 • whether one or more than one PC is required;
                                 • whether individual PCs are wanted or whether subspaces spanned by
                                   several PCs will suffice;
                                 • whether the network is required to be biologically plausible.
                              DK96, Section 4.2.7 treat finding the last few PCs as a different technique,
                              calling it minor component analysis.
                                In their Section 4.4, DK96 compare the properties, including speed, of
                              seven algorithms using simulated data. In Section 4.5 they discuss multi-
                              layer networks.
   443   444   445   446   447   448   449   450   451   452   453