Page 435 - Jolliffe I. Principal Component Analysis
P. 435

14. Generalizations and Adaptations of Principal Component Analysis
                              400
                              14.6 Miscellanea
                              This penultimate section discusses briefly some topics involving PCA that
                              do not fit very naturally into any of the other sections of the book.
                              14.6.1 Principal Components and Neural Networks
                              This subject is sufficiently large to have a book devoted to it (Diaman-
                              taras and Kung, 1996). The use of neural networks to provide non-linear
                              extensions of PCA is discussed in Section 14.1.3 and computational as-
                              pects are revisited in Appendix A1. A few other related topics are noted
                              here, drawing mainly on Diamantaras and Kung (1996), to which the in-
                              terested reader is referred for further details. Much of the work in this
                              area is concerned with constructing efficient algorithms, based on neural
                              networks, for deriving PCs. There are variations depending on whether a
                              single PC or several PCs are required, whether the first or last PCs are
                              of interest, and whether the chosen PCs are found simultaneously or se-
                              quentially. The advantage of neural network algorithms is greatest when
                              data arrive sequentially, so that the PCs need to be continually updated.
                              In some algorithms the transformation to PCs is treated as deterministic;
                              in others noise is introduced (Diamantaras and Kung, 1996, Chapter 5). In
                              this latter case, the components are written as
                                                        y = B x + e,

                              and the original variables are approximated by
                                                    ˆ x = Cy = CB x + Ce,

                              where B, C are (p × q) matrices and e is a noise term. When e = 0, mi-

                              nimizing E[(ˆ x − x) (ˆ x − x)] with respect to B and C leads to PCA (this
                              follows from Property A5 of Section 2.1), but the problem is complicated
                              by the presence of the term Ce in the expression for ˆ x. Diamantaras and
                              Kung (1996, Chapter 5) describe solutions to a number of formulations of
                              the problem of finding optimal B and C. Some constraints on B and/or C
                              are necessary to make the problem well-defined, and the different formu-
                              lations correspond to different constraints. All solutions have the common
                              feature that they involve combinations of the eigenvectors of the covariance
                              matrix of x with the eigenvectors of the covariance matrix of e. As with
                              other signal/noise problems noted in Sections 12.4.3 and 14.2.2, there is
                              the necessity either to know the covariance matrix of e or to be able to
                              estimate it separately from that of x.
                                Networks that implement extensions of PCA are described in Diamanta-
                              ras and Kung (1996, Chapters 6 and 7). Most have links to techniques
                              developed independently in other disciplines. As well as non-linear
                              extensions, the following analysis methods are discussed:
   430   431   432   433   434   435   436   437   438   439   440