Page 422 - Jolliffe I. Principal Component Analysis
P. 422

387
                                          14.2. Weights, Metrics, Transformations and Centerings
                                                      −1
                              M in (3.9.1)) is equal to Γ
                                                         (see Besse (1994b)). Futhermore, it can be
                                                       −1
                                                         is approximately optimal even without the
                              shown (Besse, 1994b) that Γ
                              assumption of multivariate normality. Optimality is defined here as find-
                                                             2
                              ing Q for which E[ 1    n   z i − ˆ z i   ] is minimized, where A is any given
                                                  i=1        A
                                               n
                              Euclidean metric. The matrix Q enters this expression because ˆ z i is the
                              Q-orthogonal projection of x i onto the optimal q-dimensional subspace.
                                Of course, the model is often a fiction, and even when it might be be-
                              lieved, Γ will typically not be known. There are, however, certain types of
                              data where plausible estimators exist for Γ. One is the case where the data
                              fall into groups or clusters. If the groups are known, then within-group
                              variation can be used to estimate Γ, and generalized PCA is equivalent
                              to a form of discriminant analysis (Besse, 1994b). In the case of unknown
                              clusters, Caussinus and Ruiz (1990) use a form of generalized PCA as a
                              projection pursuit technique to find such clusters (see Section 9.2.2). An-
                              other form of generalized PCA is used by the same authors to look for
                              outliers in a data set (Section 10.1).
                                Besse (1988) searches for an ‘optimal’ metric in a less formal manner. In
                              the context of fitting splines to functional data, he suggests several families
                              of metric that combine elements of closeness between vectors with closeness
                              between their smoothness. A family is indexed by a parameter playing
                              a similar rˆole to λ in equation (12.3.6), which governs smoothness. The
                              optimal value of λ, and hence the optimal metric, is chosen to give the
                              most clear-cut decision on how many PCs to retain.
                                Thacker (1996) independently came up with a similar approach, which
                              he refers to as metric-based PCA. He assumes that associated with a set
                              of p variables x is a covariance matrix E for errors or uncertainties. If S
                              is the covariance matrix of x, then rather than finding a x that maximizes


                              a Sa, it may be more relevant to maximize  a Sa  . This reduces to solving

                                                                     a   Ea
                              the eigenproblem
                                                                                        (14.2.6)
                                                        Sa k = l k Ea k
                              for k =1, 2,... ,p.

                                Second, third, and subsequent a k are subject to the constraints a Ea k =
                                                                                        h
                              0for h<k. In other words, a x, a x,... are uncorrelated with respect


                                                          1
                                                              2
                              to the error covariance matrix. The eigenvalue l k corresponding to the
                              eigenvector a k is equal to the ratio of the variances a Sa k , a Ea k of a x



                                                                             k     k        k
                              calculated using the overall covariance matrix S and the error covariance
                              matrix E, respectively. To implement the technique it is necessary to know
                              E, a similar difficulty to requiring knowledge of Γ to choose an optimal
                              metric for the fixed effects model. Another way of viewing the optimization

                              problem is seeing that we maximize the variance a Sa k of a x subject to

                                                                          k        k
                              the normalization constraint a Ea k = 1, so that the normalization is in

                                                         k
                              terms of the error variance of a x rather than in terms of the length of a,

                                                         k
                              as in ordinary PCA.
   417   418   419   420   421   422   423   424   425   426   427