Building a better probabilistic model of images by factorization

We describe a directed bilinear model that learns higher-order groupings among features of natural images. The model represents images in terms of two sets of latent variables: one set of variables represents which feature groups are active, while the other specifies the relative activity within groups. Such a factorized representation is beneficial because it is stable in response to small variations in the placement of features while still preserving information about relative spatial relationships. When trained on MNIST digits, the resulting representation provides state of the art performance in classification using a simple classifier. When trained on natural images, the model learns to group features according to proximity in position, orientation, and scale. The model achieves high log-likelihood (−94 nats), surpassing the current state of the art for natural images achievable with an mcRBM model.

[1]  Subhransu Maji,et al.  Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Jascha Sohl-Dickstein,et al.  Minimum Probability Flow Learning , 2009, ICML.

[3]  Joshua B. Tenenbaum,et al.  Learning bilinear models for two-factor problems in vision , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Bruno A. Olshausen,et al.  Learning Transformational Invariants from Natural Movies , 2008, NIPS.

[5]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Geoffrey E. Hinton,et al.  Using fast weights to improve persistent contrastive divergence , 2009, ICML '09.

[7]  Geoffrey E. Hinton,et al.  Unsupervised Learning of Image Transformations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Rajesh P. N. Rao,et al.  A Bilinear Model for Sparse Coding , 2002, NIPS.

[9]  Geoffrey E. Hinton,et al.  Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.

[10]  Yoshua Bengio,et al.  A Spike and Slab Restricted Boltzmann Machine , 2011, AISTATS.

[11]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[12]  Martin J. Wainwright,et al.  Scale Mixtures of Gaussians and the Statistics of Natural Images , 1999, NIPS.

[13]  Aapo Hyvärinen,et al.  A Two-Layer ICA-Like Model Estimated by Score Matching , 2007, ICANN.

[14]  Geoffrey E. Hinton,et al.  Learning Sparse Topographic Representations with Products of Student-t Distributions , 2002, NIPS.

[15]  Bruno A. Olshausen,et al.  Group Sparse Coding with a Laplacian Scale Mixture Prior , 2010, NIPS.

[16]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[18]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[19]  Geoffrey E. Hinton,et al.  Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images , 2010, AISTATS.

[20]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[21]  Michael Elad,et al.  Sparse Representation for Color Image Restoration , 2008, IEEE Transactions on Image Processing.

[22]  M. Lewicki,et al.  Learning higher-order structures in natural images , 2003, Network.

[23]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[24]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[25]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[26]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[27]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[28]  Aapo Hyvärinen,et al.  Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[29]  Geoffrey E. Hinton,et al.  Topographic Product Models Applied to Natural Scene Statistics , 2006, Neural Computation.

[30]  Rajesh P. N. Rao,et al.  Probabilistic bilinear models for appearance-based vision , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[31]  Thomas S. Huang,et al.  Supervised translation-invariant sparse coding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  J. Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[33]  Bruno A. Olshausen,et al.  Learning Horizontal Connections in a Sparse Coding Model of Natural Images , 2007, NIPS.

[34]  J. H. Hateren,et al.  Independent component filters of natural images compared with simple cells in primary visual cortex , 1998 .

[35]  Michael S. Lewicki,et al.  Is Early Vision Optimized for Extracting Higher-order Dependencies? , 2005, NIPS.

[36]  Bruno A. Olshausen,et al.  Bilinear models of natural images , 2007, Electronic Imaging.

[37]  A. Horowitz A generalized guided Monte Carlo algorithm , 1991 .

[38]  Jascha Sohl-Dickstein,et al.  Hamiltonian Annealed Importance Sampling for partition function estimation , 2012, ArXiv.

[39]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[40]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.