Page 399 - Jolliffe I. Principal Component Analysis
P. 399

13. Principal Component Analysis for Special Types of Data
                              364
                              of the mean and covariance matrix in the presence of missing values, under
                              the assumption of multivariate normality. Little and Rubin (1987, Section
                              8.2) describe three versions of the EM algorithm for solving this problem;
                              a number of other authors, for example, Anderson (1957), De Ligny et al.
                              (1981), tackled the same problem earlier by less efficient means.
                                The multivariate normal assumption is a restrictive one, and Little (1988)
                              relaxes it by adapting the EM algorithm to find MLEs when the data are
                              from a multivariate t-distribution or from a mixture of two multivariate nor-
                              mals with different covariance matrices. He calls these ‘robust’ methods for
                              dealing with missing data because they assume longer-tailed distributions
                              than multivariate normal. Little (1988) conducts a simulation study, the
                              results of which demonstrate that his robust MLEs cope well with missing
                              data, compared to other methods discussed earlier in this section. However,
                              the simulation study is limited to multivariate normal data, and to data
                              from distributions that are similar to those assumed by the robust MLEs.
                              It is not clear that the good performance of the robust MLEs would be
                              repeated for other distributions. Little and Rubin (1987, Section 8.3) also
                              extend their multivariate normal procedures to deal with covariance ma-
                              trices on which some structure is imposed. Whilst this may be appropriate
                              for factor analysis it is less relevant for PCA.
                                Another adaptation of the EM algorithm for estimation of covariance
                              matrices, the regularized EM algorithm, is given by Schneider (2001). It
                              is particularly useful when the number of variables exceeds the number
                              of observations. Schneider (2001) adds a diagonal matrix to the current
                              estimate of the covariance matrix before inverting the matrix, a similar
                              idea to that used in ridge regression.
                                Tipping and Bishop (1999a) take the idea of maximum likelihood estima-
                              tion using the EM algorithm further. They suggest an iterative algorithm
                              in which their EM procedure for estimating the probabilistic PCA model
                              (Section 3.9) is combined with Little and Rubin’s (1987) methodology for
                              estimating the parameters of a multivariate normal distribution in the pres-
                              ence of missing data. The PCs are estimated directly, rather than by going
                              through the intermediate step of estimating the covariance or correlation
                              matrix. An example in which data are randomly deleted from a data set is
                              used by Tipping and Bishop (1999a) to illustrate their procedure.
                                In the context of satellite-derived sea surface temperature measurements
                              with missing data caused by cloud cover, Houseago-Stokes and Challenor
                              (2001) compare Tipping and Bishop’s procedure with a standard inter-
                              polation technique followed by PCA on the interpolated data. The two
                              procedures give similar results but the new method is computationally
                              much more efficent. This is partly due to the fact that only the first few
                              PCs are found and that they are calculated directly, without the inter-
                              mediate step of estimating the covariance matrix. Houseago-Stokes and
                              Challenor note that the quality of interpolated data using probabilistic
                              PCA depends on the number of components q in the model. In the absence
   394   395   396   397   398   399   400   401   402   403   404