Page 402 - Jolliffe I. Principal Component Analysis
P. 402

13.7. PCA in Statistical Process Control
                                                                                            367
                                 • One- or two-dimensional plots of PC scores. It was noted in Sec-
                                   tion 10.1 that both the first few and the last few PCs may be useful
                                   for detecting (different types of) outliers, and plots of both are used
                                   in process control. In the published discussion of Roes and Does
                                   (1995), Sullivan et al. (1995) argue that the last few PCs are per-
                                   haps more useful in SPC than the first few, but in their reply to
                                   the discussion Roes and Does disagree. If p is not too large, such
                                   arguments can be overcome by using a scatterplot matrix to display
                                   all two-dimensional plots of PC scores simultaneously. Plots can be
                                   enhanced by including equal-probability contours, assuming approx-
                                   imate multivariate normality, corresponding to warning and action
                                   limits for those points that fall outside them (Jackson, 1991, Section
                                   1.7; Martin et al., 1999).
                                              2
                                 • Hotelling’s T . It was seen in Section 10.1 that this is a special case
                                   for q = p of the statistic d 2  in equation (10.1.2). If multivariate nor-
                                                         2i          2
                                   mality is assumed, the distribution of T is known, and control limits
                                   can be set based on that distribution (Jackson, 1991, Section 1.7).
                                 • The squared prediction error (SPE). This is none other than the
                                   statistic d 2  in equation (10.1.1). It was proposed by Jackson and
                                            1i
                                   Mudholkar (1979), who constructed control limits based on an
                                   approximation to its distribution. They prefer d 2  to d 2  for com-
                                                                              1i    2i
                                   putational reasons and because of its intuitive appeal as a sum of
                                   squared residuals from the (p − q)-dimensional space defined by the
                                   first (p − q) PCs. However, Jackson and Hearne (1979) indicate that
                                                     2
                                   the complement of d , in which the sum of squares of the first few
                                                     2i
                                   rather than the last few renormalized PCs is calculated, may be use-
                                   ful in process control when the objective is to look for groups of
                                   ‘out-of-control’ or outlying observations, rather than single outliers.
                                   Their basic statistic is decomposed to give separate information about
                                   variation within the sample (group) of potentially outlying observa-
                                   tions, and about the difference between the sample mean and some
                                   known standard value. In addition, they propose an alternative statis-
                                   tic based on absolute, rather than squared, values of PCs. Jackson
                                   and Mudholkar (1979) also extend their proposed control procedure,
                                            2
                                   based on d , to the multiple-outlier case, and Jackson (1991, Figure
                                            1i
                                   6.2) gives a sequence of significance tests for examining subgroups of
                                   observations in which each test is based on PCs in some way.
                                                                   2
                                Eggett and Pulsipher (1989) compare T , SPE, and the complement of
                               2
                              d suggested by Jackson and Hearne (1979), in a simulation study and find
                               2i
                              the third of these statistics to be inferior to the other two. On the basis of
                                                                         2
                              their simulations, they recommend Hotelling’s T for large samples, with
                              SPE or univariate control charts preferred for small samples. They also
                              discussed the possibility of constructing CUSUM charts based on the three
                              statistics.
   397   398   399   400   401   402   403   404   405   406   407