Modeling the joint density of two images under a variety of transformations

We describe a generative model of the relationship between two images. The model is defined as a factored three-way Boltzmann machine, in which hidden variables collaborate to define the joint correlation matrix for image pairs. Modeling the joint distribution over pairs makes it possible to efficiently match images that are the same according to a learned measure of similarity. We apply the model to several face matching tasks, and show that it learns to represent the input images using task-specific basis functions. Matching performance is superior to previous similar generative models, including recent conditional models of transformations. We also show that the model can be used as a plug-in matching score to perform invariant classification.

[1]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[2]  P. Ekman An argument for basic emotions , 1992 .

[3]  D. Ruderman,et al.  Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[4]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[5]  Yee Whye Teh,et al.  Rate-coded Restricted Boltzmann Machines for Face Recognition , 2000, NIPS.

[6]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[7]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[8]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[9]  Aleix M. Martinez Matching expression variant faces , 2003, Vision Research.

[10]  Rajesh P. N. Rao,et al.  Bilinear Sparse Coding for Invariant Vision , 2005, Neural Computation.

[11]  Rong Yan,et al.  Mining Associated Text and Images with Dual-Wing Harmoniums , 2005, UAI.

[12]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[14]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[15]  Geoffrey E. Hinton,et al.  Unsupervised Learning of Image Transformations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Boris Babenko,et al.  Task Specific Local Region Matching , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Bruno A. Olshausen,et al.  Bilinear models of natural images , 2007, Electronic Imaging.

[18]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[19]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[20]  Jeffrey R. Spies,et al.  Mapping and Manipulating Facial Expression , 2009, Language and speech.

[21]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  Geoffrey E. Hinton,et al.  Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.

[23]  Geoffrey E. Hinton,et al.  Gated Softmax Classification , 2010, NIPS.

[24]  Yann LeCun,et al.  Convolutional Learning of Spatio-temporal Features , 2010, ECCV.

[25]  Erik G. Learned-Miller,et al.  Learning class-specific image transformations with higher-order Boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[26]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.