Bilinear models of natural images

Previous work on unsupervised learning has shown that it is possible to learn Gabor-like feature representations, similar to those employed in the primary visual cortex, from the statistics of natural images. However, such representations are still not readily suited for object recognition or other high-level visual tasks because they can change drastically as the image changes to due object motion, variations in viewpoint, lighting, and other factors. In this paper, we describe how bilinear image models can be used to learn independent representations of the invariances, and their transformations, in natural image sequences. These models provide the foundation for learning higher-order feature representations that could serve as models of higher stages of processing in the cortex, in addition to having practical merit for computer vision tasks.

[1]  Geoffrey E. Hinton A Parallel Computation that Assigns Canonical Object-Based Frames of Reference , 1981, IJCAI.

[2]  E H Adelson,et al.  Spatiotemporal energy models for the perception of motion. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[3]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[4]  D. V. van Essen,et al.  A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[5]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[6]  Joshua B. Tenenbaum,et al.  Learning bilinear models for two-factor problems in vision , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  E. Rolls,et al.  INVARIANT FACE AND OBJECT RECOGNITION IN THE VISUAL SYSTEM , 1997, Progress in Neurobiology.

[8]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[9]  J. H. Hateren,et al.  Independent component filters of natural images compared with simple cells in primary visual cortex , 1998 .

[10]  Eero P. Simoncelli,et al.  A model of neuronal responses in visual area MT , 1998, Vision Research.

[11]  Rajesh P. N. Rao,et al.  Learning Lie Groups for Invariant Visual Perception , 1998, NIPS.

[12]  Brendan J. Frey,et al.  Estimating mixture models of images and inferring spatial transformations using the EM algorithm , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[13]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[14]  Aapo Hyvärinen,et al.  Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[15]  Demetri Terzopoulos,et al.  Multilinear Analysis of Image Ensembles: TensorFaces , 2002, ECCV.

[16]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[17]  Konrad P. Körding,et al.  Learning Multiple Feature Representations from Natural Image Sequences , 2002, ICANN.

[18]  David W. Arathorn,et al.  Map-Seeking Circuits in Visual Cognition: A Computational Mechanism for Biological and Machine Vision , 2002 .

[19]  Aapo Hyvärinen,et al.  Bubbles: a unifying framework for low-level statistical properties of natural image sequences. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[20]  John Daugman How iris recognition works , 2004 .

[21]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[22]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[23]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[24]  John A. Perrone,et al.  A visual motion sensor based on the properties of V1 and MT neurons , 2004, Vision Research.

[25]  John Daugman,et al.  How iris recognition works , 2002, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  Rajesh P. N. Rao,et al.  Bilinear Sparse Coding for Invariant Vision , 2005, Neural Computation.

[27]  Eero P. Simoncelli,et al.  Nonlinear Image Representation via Local Multiscale Orientation , 2005 .

[28]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..