Learning real and complex overcomplete representations from the statistics of natural images

We show how an overcomplete dictionary may be adapted to the statistics of natural images so as to provide a sparse representation of image content. When the degree of overcompleteness is low, the basis functions that emerge resemble those of Gabor wavelet transforms. As the degree of overcompleteness is increased, new families of basis functions emerge, including multiscale blobs, ridge-like functions, and gratings. When the basis functions and coefficients are allowed to be complex, they provide a description of image content in terms of local amplitude (contrast) and phase (position) of features. These complex, overcomplete transforms may be adapted to the statistics of natural movies by imposing both sparseness and temporal smoothness on the amplitudes. The basis functions that emerge form Hilbert pairs such that shifting the phase of the coefficient shifts the phase of the corresponding basis function. This type of representation is advantageous because it makes explicit the structural and dynamic content of images, which in turn allows later stages of processing to discover higher-order properties indicative of image content. We demonstrate this point by showing that it is possible to learn the higher-order structure of dynamic phase - i.e., motion - from the statistics of natural image sequences.

[1]  Avideh Zakhor,et al.  Dictionary design for matching pursuit and application to motion-compensated video coding , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Gerhard Krieger,et al.  The atoms of vision: Cartesian or polar? , 1999 .

[3]  Pascal Frossard,et al.  Flexible motion-adaptive video coding with redundant expansions , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Richard G. Baraniuk,et al.  Sparse Coding via Thresholding and Local Competition in Neural Circuits , 2008, Neural Computation.

[5]  Bruno A. Olshausen,et al.  Learning Transformational Invariants from Natural Movies , 2008, NIPS.

[6]  Martin Rehn,et al.  A network that uses few active neurones to code visual input predicts the diverse shapes of cortical receptive fields , 2007, Journal of Computational Neuroscience.

[7]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[8]  E. Bullmore,et al.  Society for Neuroscience Abstracts , 1997 .

[9]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[10]  Aapo Hyvärinen,et al.  Bubbles: a unifying framework for low-level statistical properties of natural image sequences. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[11]  Michael S. Lewicki,et al.  Robust Coding Over Noisy Overcomplete Channels , 2007, IEEE Transactions on Image Processing.

[12]  Edward H. Adelson,et al.  Shiftable multiscale transforms , 1992, IEEE Trans. Inf. Theory.