Page 401 - Jolliffe I. Principal Component Analysis
P. 401
13. Principal Component Analysis for Special Types of Data
366
taken by a candidate are therefore ‘missing.’ Shibayama (1990) devises a
method for producing a linear combination of the examination scores that
represents the overall performance of each candidate. When p = p the
method is equivalent to PCA.
Anderson et al. (1983) report a method that they attribute to Dear
(1959), which is not for dealing with missing values in a PCA, but which
uses PCA to impute missing data in a more general context. The idea seems
to be to first substitute zeros for any missing cells in the data matrix, and
then find the SVD of this matrix. Finally, the leading term in the SVD,
corresponding to the first PC, is used to approximate the missing values.
If the data matrix is column-centred, this is a variation on using means of
variables in place of missing values. Here there is the extra SVD step that
adjusts the mean values using information from other entries in the data
matrix.
Finally, note that there is a similarity of purpose in robust estimation of
PCs (see Section 10.4) to that present in handling missing data. In both
cases we identify particular observations which we cannot use in unadjusted
form, either because they are suspiciously extreme (in robust estimation),
or because they are not given at all (missing values). To completely ignore
such observations may throw away valuable information, so we attempt
to estimate ‘correct’ values for the observations in question. Similar tech-
niques may be relevant in each case. For example, we noted above the
possibility of imputing missing values for a particular observation by re-
gressing the missing variables on the variables present for that observation,
an idea that dates back at least to Beale and Little (1975), Frane (1976)
and Gleason and Staelin (1975) (see Jackson (1991, Section 14.1.5)). A
similar idea, namely robust regression of the variables on each other, is
included in Devlin et al.’s (1981) study of robust estimation of PCs (see
Section 10.4).
13.7 PCA in Statistical Process Control
The topic of this section, finding outliers, is closely linked to that of Sec-
tion 10.1, and many of the techniques used are based on those described
in that section. However, the literature on using PCA in multivariate sta-
tistical process control (SPC) is sufficiently extensive to warrant its own
section. In various manufacturing processes improved technology means
that greater numbers of variables are now measured in order to monitor
whether or not a process is ‘in control.’ It has therefore become increas-
ingly relevant to use multivariate techniques for control purposes, rather
than simply to monitor each variable separately.
The main ways in which PCA is used in this context are (Martin et al.,
1999):

