Page 431 - Jolliffe I. Principal Component Analysis
P. 431
14. Generalizations and Adaptations of Principal Component Analysis
396
description here is brief. Stone and Porrill (2001) provide a more detailed
introduction.
PCA has as its main objective the successive maximization of variance,
and the orthogonality and uncorrelatedness constraints are extras, which
are included to ensure that the different components are measuring sep-
arate things. By contrast, independent component analysis (ICA) takes
the ‘separation’ of components as its main aim. ICA starts from the view
that uncorrelatedness is rather limited as it only considers a lack of linear
relationship, and that ideally components should be statistically indepen-
dent. This is a stronger requirement than uncorrelatedness, with the two
only equivalent for normal (Gaussian) random variables. ICA can thus be
viewed as a generalization of PCA to non-normal data, which is the reason
for including it in the present section. However this may lead to the mis-
taken belief, as implied by Aires et al. (2000), that PCA assumes normality,
which it does not. Aires and coworkers also describe PCA as assuming a
model in which the variables are linearly related to a set of underlying
components, apart from an error term. This is much closer to the set-up
for factor analysis, and it is this ‘model’ that ICA generalizes.
ICA assumes, instead of the factor analysis model x = Λf + e given in
equation (7.1.1), that x = Λ(f), where Λ is some, not necessarily linear,
function and the elements of f are independent. The components (factors)
ˆ
f are estimated by f, which is a function of x. The family of functions from
which Λ can be chosen must be defined. As in much of the ICA litera-
ture so far, Aires et al. (2000) and Stone and Porrill (2001) concentrate
on the special case where Λ is restricted to linear functions. Within the
chosen family, functions are found that minimize an ‘objective cost func-
tion,’ based on information or entropy, which measures how far are the
ˆ
elements of f from independence. This differs from factor analysis in that
the latter has the objective of explaining correlations. Some details of a
‘standard’ ICA method, including its entropy criterion and an algorithm
for implementation, are given by Stone and Porrill (2001).
ˆ
Typically, an iterative method is used to find the optimal f, and like
projection pursuit (see Section 9.2.2), a technique with which Stone and
Porrill (2001) draw parallels, it is computationally expensive. As with pro-
jection pursuit, PCA can be used to reduce dimensionality (use the first m,
rather than all p) before starting the ICA algorithm, in order to reduce the
computational burden (Aires et al., 2000; Stone and Porrill, 2001). It is also
suggested by Aires and coworkers that the PCs form a good starting point
for the iterative algorithm, as they are uncorrelated. These authors give
an example involving sea surface temperature, in which they claim that
the ICs are physically more meaningful than PCs. The idea that physically
meaningful signals underlying a data set should be independent is a ma-
jor motivation for ICA. This is very different from the view taken in some
applications of factor analysis or rotated PCA, where it is believed that un-

