Page 403 - Jolliffe I. Principal Component Analysis
P. 403

13. Principal Component Analysis for Special Types of Data
                              368
                                The control limits described so far are all based on the assumption of
                              approximate multivariate normality. Martin and Morris (1996) introduce
                              a non-parametric procedure that provides warning and action contours
                              on plots of PCs. These contours can be very different from the normal-
                              based ellipses. The idea of the procedure is to generate bootstrap samples
                              from the data set and from each of these calculate the value of a (pos-
                              sibly vector-valued) statistic of interest. A smooth approximation to the
                              probability density of this statistic is then constructed using kernel density
                              estimation, and the required contours are derived from this distribution.
                              Coleman (1985) suggests that when using PCs in quality control, the PCs
                              should be estimated robustly (see Section 10.4). Sullivan et al. (1995) do
                              this by omitting some probable outliers, identified from an initial scan of
                              the data, before carrying out a PCA.
                                When a variable is used to monitor a process over time, its successive
                              values are likely to be correlated unless the spacing between observations is
                              large. One possibility for taking into account this autocorrelation is to plot
                              an exponentially weighted moving average of the observed values. Wold
                              (1994) suggests that similar ideas should be used when the monitoring
                              variables are PC scores, and he describes an algorithm for implementing
                              ‘exponentially weighted moving principal components analysis.’
                                Data often arise in SPC for which, as well as different variables and differ-
                              ent times of measurement, there is a third ‘mode,’ namely different batches.
                              So-called multiway, or three-mode, PCA can then be used (see Section 14.5
                              and Nomikos and MacGregor (1995)). Grimshaw et al. (1998) note the
                              possible use of multiway PCA simultaneously on both the variables moni-
                              toring the process and the variables measuring inputs or initial conditions,
                              though they prefer a regression-based approach involving modifications of
                                         2
                              Hotelling’s T and the SPE statistic.
                                Boyles (1996) addresses the situation in which the number of variables
                              exceeds the number of observations. The sample covariance matrix S is
                                                          2
                              then singular and Hotelling’s T cannot be calculated. One possibility is
                              to replace S −1  by    r  l −1 a k a for r< n, based on the first r terms

                                                 k=1 k    k
                              in the spectral decomposition of S (the sample version of Property A3 in
                              Section 2.1). However, the data of interest to Boyles (1996) have variables
                              measured at points of a regular lattice on the manufactured product. This
                              structure implies that a simple pattern exists in the population covariance
                              matrix Σ. Using knowledge of this pattern, a positive definite estimate of
                                                            2
                              Σ can be calculated and used in T in place of S. Boyles finds appropriate
                              estimates for three different regular lattices.
                                Lane et al. (2001) consider the case where a several products or processes
                              are monitored simultaneously. They apply Flury’s common PC subspace
                              model (Section 13.5) to this situation. McCabe (1986) suggests the use
                              of principal variables (see Section 6.3) to replace principal components in
                              quality control.
   398   399   400   401   402   403   404   405   406   407   408