Page 422 - Jolliffe I. Principal Component Analysis
P. 422
387
14.2. Weights, Metrics, Transformations and Centerings
−1
M in (3.9.1)) is equal to Γ
(see Besse (1994b)). Futhermore, it can be
−1
is approximately optimal even without the
shown (Besse, 1994b) that Γ
assumption of multivariate normality. Optimality is defined here as find-
2
ing Q for which E[ 1 n z i − ˆ z i ] is minimized, where A is any given
i=1 A
n
Euclidean metric. The matrix Q enters this expression because ˆ z i is the
Q-orthogonal projection of x i onto the optimal q-dimensional subspace.
Of course, the model is often a fiction, and even when it might be be-
lieved, Γ will typically not be known. There are, however, certain types of
data where plausible estimators exist for Γ. One is the case where the data
fall into groups or clusters. If the groups are known, then within-group
variation can be used to estimate Γ, and generalized PCA is equivalent
to a form of discriminant analysis (Besse, 1994b). In the case of unknown
clusters, Caussinus and Ruiz (1990) use a form of generalized PCA as a
projection pursuit technique to find such clusters (see Section 9.2.2). An-
other form of generalized PCA is used by the same authors to look for
outliers in a data set (Section 10.1).
Besse (1988) searches for an ‘optimal’ metric in a less formal manner. In
the context of fitting splines to functional data, he suggests several families
of metric that combine elements of closeness between vectors with closeness
between their smoothness. A family is indexed by a parameter playing
a similar rˆole to λ in equation (12.3.6), which governs smoothness. The
optimal value of λ, and hence the optimal metric, is chosen to give the
most clear-cut decision on how many PCs to retain.
Thacker (1996) independently came up with a similar approach, which
he refers to as metric-based PCA. He assumes that associated with a set
of p variables x is a covariance matrix E for errors or uncertainties. If S
is the covariance matrix of x, then rather than finding a x that maximizes
a Sa, it may be more relevant to maximize a Sa . This reduces to solving
a Ea
the eigenproblem
(14.2.6)
Sa k = l k Ea k
for k =1, 2,... ,p.
Second, third, and subsequent a k are subject to the constraints a Ea k =
h
0for h<k. In other words, a x, a x,... are uncorrelated with respect
1
2
to the error covariance matrix. The eigenvalue l k corresponding to the
eigenvector a k is equal to the ratio of the variances a Sa k , a Ea k of a x
k k k
calculated using the overall covariance matrix S and the error covariance
matrix E, respectively. To implement the technique it is necessary to know
E, a similar difficulty to requiring knowledge of Γ to choose an optimal
metric for the fixed effects model. Another way of viewing the optimization
problem is seeing that we maximize the variance a Sa k of a x subject to
k k
the normalization constraint a Ea k = 1, so that the normalization is in
k
terms of the error variance of a x rather than in terms of the length of a,
k
as in ordinary PCA.

