Separating appearance from deformation

By representing images and image prototypes by linear subspaces spanned by "tangent vectors" (derivatives of an image with respect to translation, rotation, etc.), impressive invariance to known types of uniform distortion can be built into feedforward discriminators. We describe a new probability model that can jointly cluster data and learn mixtures of nonuniform, smooth deformation fields. Our fields are based on low-frequency wavelets, so they use very few parameters to model a wide range of smooth deformations (unlike, e.g., factor analysis, which uses a large number of parameters to model deformations). In spirit, our ideas are most similar to the idea of separating content from style published by Tenenbaum and Freeman. However, our models do not need labeled data for training, and thus allow for unsupervised separation of appearance from deformation. We give results on handwritten digit recognition and face recognition.

[1]  Timothy F. Cootes,et al.  Determining correspondences for statistical models of facial appearance , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  Geoffrey E. Hinton,et al.  Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.

[4]  David J. Fleet,et al.  Learning parameterized models of image motion , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Brendan J. Frey,et al.  Topographic Transformation as a Discrete Latent Variable , 1999, NIPS.

[6]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[7]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[8]  Tomaso A. Poggio,et al.  A bootstrapping algorithm for learning linear models of object classes , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Gregory D. Hager,et al.  Efficient Region Tracking With Parametric Models of Geometry and Illumination , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Takeo Kanade,et al.  Optical flow estimation using wavelet motion model , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[11]  Brendan J. Frey,et al.  Estimating mixture models of images and inferring spatial transformations using the EM algorithm , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[12]  Nuno Vasconcelos,et al.  Multiresolution Tangent Distance for Affine-invariant Classification , 1997, NIPS.