Page 399 - Jolliffe I. Principal Component Analysis
P. 399
13. Principal Component Analysis for Special Types of Data
364
of the mean and covariance matrix in the presence of missing values, under
the assumption of multivariate normality. Little and Rubin (1987, Section
8.2) describe three versions of the EM algorithm for solving this problem;
a number of other authors, for example, Anderson (1957), De Ligny et al.
(1981), tackled the same problem earlier by less efficient means.
The multivariate normal assumption is a restrictive one, and Little (1988)
relaxes it by adapting the EM algorithm to find MLEs when the data are
from a multivariate t-distribution or from a mixture of two multivariate nor-
mals with different covariance matrices. He calls these ‘robust’ methods for
dealing with missing data because they assume longer-tailed distributions
than multivariate normal. Little (1988) conducts a simulation study, the
results of which demonstrate that his robust MLEs cope well with missing
data, compared to other methods discussed earlier in this section. However,
the simulation study is limited to multivariate normal data, and to data
from distributions that are similar to those assumed by the robust MLEs.
It is not clear that the good performance of the robust MLEs would be
repeated for other distributions. Little and Rubin (1987, Section 8.3) also
extend their multivariate normal procedures to deal with covariance ma-
trices on which some structure is imposed. Whilst this may be appropriate
for factor analysis it is less relevant for PCA.
Another adaptation of the EM algorithm for estimation of covariance
matrices, the regularized EM algorithm, is given by Schneider (2001). It
is particularly useful when the number of variables exceeds the number
of observations. Schneider (2001) adds a diagonal matrix to the current
estimate of the covariance matrix before inverting the matrix, a similar
idea to that used in ridge regression.
Tipping and Bishop (1999a) take the idea of maximum likelihood estima-
tion using the EM algorithm further. They suggest an iterative algorithm
in which their EM procedure for estimating the probabilistic PCA model
(Section 3.9) is combined with Little and Rubin’s (1987) methodology for
estimating the parameters of a multivariate normal distribution in the pres-
ence of missing data. The PCs are estimated directly, rather than by going
through the intermediate step of estimating the covariance or correlation
matrix. An example in which data are randomly deleted from a data set is
used by Tipping and Bishop (1999a) to illustrate their procedure.
In the context of satellite-derived sea surface temperature measurements
with missing data caused by cloud cover, Houseago-Stokes and Challenor
(2001) compare Tipping and Bishop’s procedure with a standard inter-
polation technique followed by PCA on the interpolated data. The two
procedures give similar results but the new method is computationally
much more efficent. This is partly due to the fact that only the first few
PCs are found and that they are calculated directly, without the inter-
mediate step of estimating the covariance matrix. Houseago-Stokes and
Challenor note that the quality of interpolated data using probabilistic
PCA depends on the number of components q in the model. In the absence

