Page 403 - Jolliffe I. Principal Component Analysis

P. 403

13. Principal Component Analysis for Special Types of Data
368
The control limits described so far are all based on the assumption of
approximate multivariate normality. Martin and Morris (1996) introduce
a non-parametric procedure that provides warning and action contours
on plots of PCs. These contours can be very different from the normal-
based ellipses. The idea of the procedure is to generate bootstrap samples
from the data set and from each of these calculate the value of a (pos-
sibly vector-valued) statistic of interest. A smooth approximation to the
probability density of this statistic is then constructed using kernel density
estimation, and the required contours are derived from this distribution.
Coleman (1985) suggests that when using PCs in quality control, the PCs
should be estimated robustly (see Section 10.4). Sullivan et al. (1995) do
this by omitting some probable outliers, identified from an initial scan of
the data, before carrying out a PCA.
When a variable is used to monitor a process over time, its successive
values are likely to be correlated unless the spacing between observations is
large. One possibility for taking into account this autocorrelation is to plot
an exponentially weighted moving average of the observed values. Wold
(1994) suggests that similar ideas should be used when the monitoring
variables are PC scores, and he describes an algorithm for implementing
‘exponentially weighted moving principal components analysis.’
Data often arise in SPC for which, as well as different variables and differ-
ent times of measurement, there is a third ‘mode,’ namely different batches.
So-called multiway, or three-mode, PCA can then be used (see Section 14.5
and Nomikos and MacGregor (1995)). Grimshaw et al. (1998) note the
possible use of multiway PCA simultaneously on both the variables moni-
toring the process and the variables measuring inputs or initial conditions,
though they prefer a regression-based approach involving modifications of
2
Hotelling’s T and the SPE statistic.
Boyles (1996) addresses the situation in which the number of variables
exceeds the number of observations. The sample covariance matrix S is
2
then singular and Hotelling’s T cannot be calculated. One possibility is
to replace S −1 by r l −1 a k a for r< n, based on the first r terms

k=1 k k
in the spectral decomposition of S (the sample version of Property A3 in
Section 2.1). However, the data of interest to Boyles (1996) have variables
measured at points of a regular lattice on the manufactured product. This
structure implies that a simple pattern exists in the population covariance
matrix Σ. Using knowledge of this pattern, a positive definite estimate of
2
Σ can be calculated and used in T in place of S. Boyles finds appropriate
estimates for three different regular lattices.
Lane et al. (2001) consider the case where a several products or processes
are monitored simultaneously. They apply Flury’s common PC subspace
model (Section 13.5) to this situation. McCabe (1986) suggests the use
of principal variables (see Section 6.3) to replace principal components in
quality control.

398 399 400 401 402 403 404 405 406 407 408