Principal Components

Suppose that we have a set of p-dimensional random variables x. Typically we will be in a situation where we have n observations of these random variables, and the parameters of their distribution, such as means, variances and covari-ances are estimated from the values of the observations, but for the moment we think about the distribution of the underlying random variables rather than the particular observations. The basic idea of principal components is to try to describe p dimensional data in as small a number of dimensions (less than p) as possible, while preserving as much as possible of the structures involved. In doing this, we concentrate on variances. The rst step is to look for a linear function z 1 = 1 T x or z 1 = 1 T (x ?) of the elements of x which has maximum variance, where 1 is a vector of p components 11 , 21 , : : : , p1 (beware that some other authors write the suuces the other way round). Note we often subtract the mean. Then 1 T x = 11 x 1 + 21 x 2 + + p1 x p = p X j=1 j1 x j : Next, we look for a linear function z 2 = 2 T x uncorrelated with 1 T x which has maximum variance subject to this condition, and so on, so that at the kth stage a linear function z k = k x is found which has maximum variance subject to being uncorrelated with 1 T x, 2 T x, : : : , k?1 T x. We refer to the kth derived variable k T x as the kth principal component. We stop after the mth stage when, in some sense, the majority of the variation has been accounted for. Before proceeding further, let us recall some results on means and variances of linear functions. Suppose x is any p-dimensional random variable, and that is any constant p-dimensional vector. Then the mean of T x is