Page 404 - Jolliffe I. Principal Component Analysis
P. 404

369
                                                             13.8. Some Other Types of Data
                                Apley and Shi (2001) assume that a vector of p measured features from
                              a process or product can be modelled as in the probabilistic PCA model of
                              Tipping and Bishop (1999a), described in Section 3.9. The vector therefore
                                                          2
                              has covariance matrix BB + σ I p , where in the present context the q

                              columns of B are taken to represent the effects of q uncorrelated faults
                              on the p measurements. The vectors of principal component coefficients
                              (loadings) that constitute the columns of B thus provide information about
                              the nature of the faults. To allow for the fact that the faults may not
                              be uncorrelated, Apley and Shi suggest that interpreting the faults may
                              be easier if the principal component loadings are rotated towards simple
                              structure (see Section 11.1).
                              13.8 Some Other Types of Data

                              In this section we discuss briefly some additional types of data with special
                              features.
                              Vector-valued or Directional Data—Complex PCA
                              Section 12.2.3 discussed a special type of complex PCA in which the series
                              x t + ix H  is analysed, where x t is a p-variate time series, x H  is its Hilbert
                                               √
                                    t                                            t
                              transform and i =  −1. More generally, if x t , y t are two real-valued p-
                              variate series, PCA can be done on the complex series x t + iy t , and this
                              general form of complex PCA is relevant not just in a time series context,
                              but whenever two variables are recorded in each cell of the (n × p) data
                              matrix. This is then a special case of three-mode data (Section 14.5) for
                              which the index for the third mode takes only two values.
                                One situation in which such data arise is for landmark data (see Sec-
                              tion 13.2). Another is when the data consist of vectors in two dimensions,
                              as with directional data. A specific example is the measurement of wind,
                              which involves both strength and direction, and can be expressed as a
                              vector whose elements are the zonal (x or easterly) and meridional (y or
                              northerly) components.
                                Suppose that X is an (n × p) data matrix whose (h, j)th element is
                              x hj + iy hj . A complex covariance matrix is defined as
                                                             1
                                                       S =      X X,
                                                                  †
                                                           n − 1
                                     †
                              where X is the conjugate transpose of X. Complex PCA is then done
                              by finding the eigenvalues and eigenvectors of S. Because S is Hermitian
                              the eigenvalues are real and can still be interpreted as proportions of total
                              variance accounted for by each complex PC. However, the eigenvectors are
                              complex, and the PC scores, which are obtained as in the real case by mul-
                              tiplying the data matrix by the matrix of eigenvectors, are also complex.
   399   400   401   402   403   404   405   406   407   408   409