Page 185 - Jolliffe I. Principal Component Analysis
P. 185

7. Principal Component Analysis and Factor Analysis
                              154
                              chosen to maximize

                                                    m
                                                         p

                                               Q =         b 4  −  1    p  b 2  	 2  
 .  (7.2.2)
                                                                p
                                                            jk         jk
                                                    k=1 j=1        j=1
                              The terms in the square brackets are proportional to the variances of
                              squared loadings for each rotated factor. In the usual implementations of
                              factor analysis the loadings are necessarily between −1 and 1, so the cri-
                              terion tends to drive squared loadings towards the end of the range 0 to 1,
                              and hence loadings towards −1, 0 or 1 and away from intermediate values,
                              as required. The quantity Q in equation (7.2.2) is the raw varimax criterion.
                              A normalized version is also used in which b jk is replaced by
                                                             b jk
                                                                 2
                                                                b
                                                             m
                                                             k=1 jk
                              in (7.2.2).
                                As discussed in Section 11.1, rotation can be applied to principal compo-
                              nent coefficients in order to simplify them, as is done with factor loadings.
                              The simplification achieved by rotation can help in interpreting the factors
                              or rotated PCs. This is illustrated nicely using diagrams (see Figures 7.1
                              and 7.2) in the simple case where only m = 2 factors or PCs are retained.
                              Figure 7.1 plots the loadings of ten variables on two factors. In fact, these
                              loadings are the coefficients a 1 , a 2 for the first two PCs from the exam-
                              ple presented in detail later in the chapter, normalized so that a a k = l k ,

                                                                                      k
                              where l k is the kth eigenvalue of S, rather than a a k = 1. When an orthog-

                                                                        k
                              onal rotation method (varimax) is performed, the loadings for the rotated
                              factors (PCs) are given by the projections of each plotted point onto the
                              axes represented by dashed lines in Figure 7.1.
                                Similarly, rotation using an oblique rotation method (direct quartimin)
                              gives loadings after rotation by projecting onto the new axes shown in
                              Figure 7.2. It is seen that in Figure 7.2 all points lie close to one or other
                              of the axes, and so have near-zero loadings on the factor represented by
                              the other axis, giving a very simple structure for the loadings. The loadings
                              implied for the rotated factors in Figure 7.1, whilst having simpler structure
                              than the original coefficients, are not as simple as those for Figure 7.2, thus
                              illustrating the advantage of oblique, compared to orthogonal, rotation.
                                Returning to the first stage in the estimation of Λ and Ψ, there is some-
                              times a problem with identifiability, meaning that the size of the data set
                              is too small compared to the number of parameters to allow those param-
                              eters to be estimated (Jackson, 1991, Section 17.2.6; Everitt and Dunn,
                              2001, Section 12.3)). Assuming that identifiability is not a problem, there
                              are a number of ways of constructing initial estimates (see, for example,
                              Lewis-Beck (1994, Section II.2); Rencher (1998, Section 10.3); Everitt and
                              Dunn (2001, Section 12.2)). Some, such as the centroid method (see Cat-
                              tell, 1978, Section 2.3), were developed before the advent of computers and
   180   181   182   183   184   185   186   187   188   189   190