oint invariant face recognition using endent component analysis and at tractor networks

We have explored two approaches to recognizing faces across changes in pose. First, we developed a representation of face images based on independent component analysis (ICA) and compared it to a principal component analysis (PCA) representation for face recognition. The ICA basis vectors for this data set were more spatially local than the PCA basis vectors and the ICA representation had greater invariance to changes in pose. Second, we present a model for the development of viewpoint invariant responses to faces from visual experience in a biological system. The temporal continuity of natural visual experience was incorporated into an attractor network model by Hebbian learning following a lowpass temporal filter on unit activities. When combined with the temporal filter, a basic Hebbian update rule became a generalization of Griniasty et al. (1993), which associates temporally proximal input patterns into basins of attraction. The system acquired representations of faces that were largely independent of pose. 1 Independent component representations of faces Important advances in face recognition have employed forms of principal component analysis, which considers only second-order moments of the input (Cottrell & Metcalfe, 1991; Turk & Pentland 1991). Independent component analysis (ICA) is a generalization of principal component analysis (PCA), which decorrelates the higher-order moments of the input (Comon, 1994). In a task such as face recognition, much of the important information is contained in the high-order statistics of the images. A representational basis in which the high-order statistics are decorrelated may be more powerful for face recognition than one in which only the second order statistics are decorrelated, as in PCA representations. We compared an ICAbased representation to a PCA-based representation for recognizing faces across changes in pose. Figure 1: Examples from image set (Beymer, 1994). The image set contained 200 images of faces, consisting of 40 subjects at each of five poses (Figure 1). The images were converted to vectors and comprised the rows of a 200 x 3600 data matrix, X. We consider the face images in X to be a linear mixture of an unknown set of statistically independent source images S, where A is an unknown mixing matrix (Figure 2). The sources are recovered by a matrix of learned filters, W, which produce statistically independent outputs, U .