Page 431 - Jolliffe I. Principal Component Analysis
P. 431

14. Generalizations and Adaptations of Principal Component Analysis
                              396
                              description here is brief. Stone and Porrill (2001) provide a more detailed
                              introduction.
                                PCA has as its main objective the successive maximization of variance,
                              and the orthogonality and uncorrelatedness constraints are extras, which
                              are included to ensure that the different components are measuring sep-
                              arate things. By contrast, independent component analysis (ICA) takes
                              the ‘separation’ of components as its main aim. ICA starts from the view
                              that uncorrelatedness is rather limited as it only considers a lack of linear
                              relationship, and that ideally components should be statistically indepen-
                              dent. This is a stronger requirement than uncorrelatedness, with the two
                              only equivalent for normal (Gaussian) random variables. ICA can thus be
                              viewed as a generalization of PCA to non-normal data, which is the reason
                              for including it in the present section. However this may lead to the mis-
                              taken belief, as implied by Aires et al. (2000), that PCA assumes normality,
                              which it does not. Aires and coworkers also describe PCA as assuming a
                              model in which the variables are linearly related to a set of underlying
                              components, apart from an error term. This is much closer to the set-up
                              for factor analysis, and it is this ‘model’ that ICA generalizes.
                                ICA assumes, instead of the factor analysis model x = Λf + e given in
                              equation (7.1.1), that x = Λ(f), where Λ is some, not necessarily linear,
                              function and the elements of f are independent. The components (factors)
                                              ˆ
                              f are estimated by f, which is a function of x. The family of functions from
                              which Λ can be chosen must be defined. As in much of the ICA litera-
                              ture so far, Aires et al. (2000) and Stone and Porrill (2001) concentrate
                              on the special case where Λ is restricted to linear functions. Within the
                              chosen family, functions are found that minimize an ‘objective cost func-
                              tion,’ based on information or entropy, which measures how far are the
                                        ˆ
                              elements of f from independence. This differs from factor analysis in that
                              the latter has the objective of explaining correlations. Some details of a
                              ‘standard’ ICA method, including its entropy criterion and an algorithm
                              for implementation, are given by Stone and Porrill (2001).
                                                                                     ˆ
                                Typically, an iterative method is used to find the optimal f, and like
                              projection pursuit (see Section 9.2.2), a technique with which Stone and
                              Porrill (2001) draw parallels, it is computationally expensive. As with pro-
                              jection pursuit, PCA can be used to reduce dimensionality (use the first m,
                              rather than all p) before starting the ICA algorithm, in order to reduce the
                              computational burden (Aires et al., 2000; Stone and Porrill, 2001). It is also
                              suggested by Aires and coworkers that the PCs form a good starting point
                              for the iterative algorithm, as they are uncorrelated. These authors give
                              an example involving sea surface temperature, in which they claim that
                              the ICs are physically more meaningful than PCs. The idea that physically
                              meaningful signals underlying a data set should be independent is a ma-
                              jor motivation for ICA. This is very different from the view taken in some
                              applications of factor analysis or rotated PCA, where it is believed that un-
   426   427   428   429   430   431   432   433   434   435   436