Learning Multiscale Representations of Natural Scenes Using Dirichlet Processes

We develop nonparametric Bayesian models for multiscale representations of images depicting natural scene categories. Individual features or wavelet coefficients are marginally described by Dirichlet process (DP) mixtures, yielding the heavy-tailed marginal distributions characteristic of natural images. Dependencies between features are then captured with a hidden Markov tree, and Markov chain Monte Carlo methods used to learn models whose latent state space grows in complexity as more images are observed. By truncating the potentially infinite set of hidden states, we are able to exploit efficient belief propagation methods when learning these hierarchical Dirichlet process hidden Markov trees (HDP-HMTs) from data. We show that our generative models capture interesting qualitative structure in natural scenes, and more accurately categorize novel images than models which ignore spatial relationships among features.

[1]  Edward H. Adelson,et al.  Shiftable multiscale transforms , 1992, IEEE Trans. Inf. Theory.

[2]  T.,et al.  Shiftable Multi-scale TransformsEero , 1992 .

[3]  H. Chipman,et al.  Adaptive Bayesian Wavelet Shrinkage , 1997 .

[4]  Robert D. Nowak,et al.  Wavelet-based statistical signal processing using hidden Markov models , 1998, IEEE Trans. Signal Process..

[5]  H. Ishwaran,et al.  Markov chain Monte Carlo in approximate Dirichlet and beta two-parameter process hierarchical models , 2000 .

[6]  Eero P. Simoncelli,et al.  Random Cascades on Wavelet Trees and Their Use in Analyzing and Modeling Natural Images , 2001 .

[7]  H. Ishwaran,et al.  Exact and approximate sum representations for the Dirichlet process , 2002 .

[8]  A. Willsky Multiresolution Markov models for signal and image processing , 2002, Proc. IEEE.

[9]  Antonio Torralba,et al.  Statistics of natural image categories , 2003, Network.

[10]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[11]  Martin J. Wainwright,et al.  Image denoising using scale mixtures of Gaussians in the wavelet domain , 2003, IEEE Trans. Image Process..

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[14]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[15]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[16]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[17]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[18]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[20]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[21]  Luc Van Gool,et al.  Modeling scenes with local descriptors and latent aspects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[22]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[23]  Lucas C. Parra,et al.  Varying complexity in tree-structured image distribution models , 2006, IEEE Transactions on Image Processing.

[24]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25]  Erik B. Sudderth Graphical models for visual object recognition and tracking , 2006 .

[26]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[27]  Bernt Schiele,et al.  Semantic Modeling of Natural Scenes for Content-Based Image Retrieval , 2007, International Journal of Computer Vision.

[28]  Michael I. Jordan,et al.  Image Denoising with Nonparametric Hidden Markov Trees , 2007, 2007 IEEE International Conference on Image Processing.