Page 55 - Jolliffe I. Principal Component Analysis
P. 55
2. Properties of Population Principal Components
24
the original variables rearranged in decreasing order of the size of their
variances. Also, the first few PCs account for little of the off-diagonal ele-
ments of Σ in this case (see Property A3) above. In most circumstances,
such a transformation to PCs is of little value, and it will not occur if the
correlation, rather than covariance, matrix is used.
The example has shown that it is unwise to use PCs on a covariance
matrix when x consists of measurements of different types, unless there is a
strong conviction that the units of measurements chosen for each element of
x are the only ones that make sense. Even when this condition holds, using
the covariance matrix will not provide very informative PCs if the variables
have widely differing variances. Furthermore, with covariance matrices and
non-commensurable variables the PC scores are difficult to interpret—what
does it mean to add a temperature to a weight? For correlation matrices, the
standardized variates are all dimensionless and can be happily combined
to give PC scores (Legendre and Legendre, 1983, p. 129).
Another problem with the use of covariance matrices is that it is more
difficult than with correlation matrices to compare informally the results
from different analyses. Sizes of variances of PCs have the same implications
for different correlation matrices of the same dimension, but not for different
covariance matrices. Also, patterns of coefficients in PCs can be readily
compared for different correlation matrices to see if the two correlation
matrices are giving similar PCs, whereas informal comparisons are often
much trickier for covariance matrices. Formal methods for comparing PCs
from different covariance matrices are, however, available (see Section 13.5).
The use of covariance matrices does have one general advantage over
correlation matrices, and a particular advantage seen in a special case. The
general advantage is that statistical inference regarding population PCs
based on sample PCs is easier for covariance matrices than for correlation
matrices, as will be discussed in Section 3.7. This is relevant when PCA
is used in a context where statistical inference is important. However, in
practice, it is more common to use PCA as a descriptive, rather than an
inferential, tool, and then the potential advantage of covariance matrix
PCA is irrelevant.
The second advantage of covariance matrices holds in the special case
when all elements of x are measured in the same units. It can then be
argued that standardizing the elements of x to give correlations is equiv-
alent to making an arbitrary choice of measurement units. This argument
of arbitrariness can also be applied more generally to the use of correlation
matrices, but when the elements of x are measurements of different types,
the choice of measurement units leading to a covariance matrix is even
more arbitrary, so that the correlation matrix is again preferred.
Standardizing the variables may be thought of as an attempt to remove
the problem of scale dependence from PCA. Another way of doing this is
to compute PCs of the logarithms of the original data (Flury, 1997, Section
8.4), though this is only feasible and sensible for restricted types of data,

