Page 401 - Jolliffe I. Principal Component Analysis
P. 401

13. Principal Component Analysis for Special Types of Data
                              366
                              taken by a candidate are therefore ‘missing.’ Shibayama (1990) devises a
                              method for producing a linear combination of the examination scores that
                              represents the overall performance of each candidate. When p = p the

                              method is equivalent to PCA.
                                Anderson et al. (1983) report a method that they attribute to Dear
                              (1959), which is not for dealing with missing values in a PCA, but which
                              uses PCA to impute missing data in a more general context. The idea seems
                              to be to first substitute zeros for any missing cells in the data matrix, and
                              then find the SVD of this matrix. Finally, the leading term in the SVD,
                              corresponding to the first PC, is used to approximate the missing values.
                              If the data matrix is column-centred, this is a variation on using means of
                              variables in place of missing values. Here there is the extra SVD step that
                              adjusts the mean values using information from other entries in the data
                              matrix.
                                Finally, note that there is a similarity of purpose in robust estimation of
                              PCs (see Section 10.4) to that present in handling missing data. In both
                              cases we identify particular observations which we cannot use in unadjusted
                              form, either because they are suspiciously extreme (in robust estimation),
                              or because they are not given at all (missing values). To completely ignore
                              such observations may throw away valuable information, so we attempt
                              to estimate ‘correct’ values for the observations in question. Similar tech-
                              niques may be relevant in each case. For example, we noted above the
                              possibility of imputing missing values for a particular observation by re-
                              gressing the missing variables on the variables present for that observation,
                              an idea that dates back at least to Beale and Little (1975), Frane (1976)
                              and Gleason and Staelin (1975) (see Jackson (1991, Section 14.1.5)). A
                              similar idea, namely robust regression of the variables on each other, is
                              included in Devlin et al.’s (1981) study of robust estimation of PCs (see
                              Section 10.4).




                              13.7 PCA in Statistical Process Control


                              The topic of this section, finding outliers, is closely linked to that of Sec-
                              tion 10.1, and many of the techniques used are based on those described
                              in that section. However, the literature on using PCA in multivariate sta-
                              tistical process control (SPC) is sufficiently extensive to warrant its own
                              section. In various manufacturing processes improved technology means
                              that greater numbers of variables are now measured in order to monitor
                              whether or not a process is ‘in control.’ It has therefore become increas-
                              ingly relevant to use multivariate techniques for control purposes, rather
                              than simply to monitor each variable separately.
                                The main ways in which PCA is used in this context are (Martin et al.,
                              1999):
   396   397   398   399   400   401   402   403   404   405   406