Super-Resolution with Deep Convolutional Sufficient Statistics

Inverse problems in image and audio, and super-resolution in particular, can be seen as high-dimensional structured prediction problems, where the goal is to characterize the conditional distribution of a high-resolution output given its low-resolution corrupted observation. When the scaling ratio is small, point estimates achieve impressive performance, but soon they suffer from the regression-to-the-mean problem, result of their inability to capture the multi-modality of this conditional distribution. Modeling high-dimensional image and audio distributions is a hard task, requiring both the ability to model complex geometrical structures and textured regions. In this paper, we propose to use as conditional model a Gibbs distribution, where its sufficient statistics are given by deep convolutional neural networks. The features computed by the network are stable to local deformation, and have reduced variance when the input is a stationary texture. These properties imply that the resulting sufficient statistics minimize the uncertainty of the target signals given the degraded observations, while being highly informative. The filters of the CNN are initialized by multiscale complex wavelets, and then we propose an algorithm to fine-tune them by estimating the gradient of the conditional log-likelihood, which bears some similarities with Generative Adversarial Networks. We evaluate experimentally the proposed approach in the image super-resolution task, but the approach is general and could be used in other challenging ill-posed problems such as audio bandwidth extension.

[1]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[2]  William T. Freeman,et al.  Example-Based Super-Resolution , 2002, IEEE Computer Graphics and Applications.

[3]  Eero P. Simoncelli,et al.  A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients , 2000, International Journal of Computer Vision.

[4]  Song-Chun Zhu,et al.  Filters, Random Fields and Maximum Entropy (FRAME): Towards a Unified Theory for Texture Modeling , 1998, International Journal of Computer Vision.

[5]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[6]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[7]  H. Shum,et al.  Image super-resolution using gradient profile prior , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[9]  Thomas S. Huang,et al.  Image super-resolution as sparse representation of raw image patches , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[11]  Quoc V. Le,et al.  Measuring Invariances in Deep Networks , 2009, NIPS.

[12]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[13]  Dima Damen,et al.  Computer Vision and Pattern Recognition (CVPR) , 2009 .

[14]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[15]  Stéphane Mallat,et al.  Group Invariant Scattering , 2011, ArXiv.

[16]  Jiquan Ngiam,et al.  Learning Deep Energy Models , 2011, ICML.

[17]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Thomas S. Huang,et al.  Coupled Dictionary Training for Image Super-Resolution , 2012, IEEE Transactions on Image Processing.

[19]  Stéphane Mallat,et al.  Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[20]  Stéphane Mallat,et al.  Audio Texture Synthesis with Scattering Moments , 2013, ArXiv.

[21]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[22]  Joan Bruna,et al.  Signal recovery from Pooling Representations , 2013, ICML.

[23]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[24]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[25]  Shiguang Shan,et al.  Deep Network Cascade for Image Super-resolution , 2014, ECCV.

[26]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[27]  Yoshua Bengio,et al.  Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.

[28]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Leon A. Gatys,et al.  Texture Synthesis Using Convolutional Neural Networks , 2015, NIPS.

[30]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[31]  Guillermo Sapiro,et al.  Learning Efficient Sparse and Low Rank Models , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[33]  Ying Nian Wu,et al.  Generative Modeling of Convolutional Neural Networks , 2014, ICLR.

[34]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[35]  Eero P. Simoncelli,et al.  The local low-dimensionality of natural images , 2014, ICLR.

[36]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[39]  Leon A. Gatys,et al.  Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks , 2015, ArXiv.

[40]  Yang Lu,et al.  A Theory of Generative ConvNet , 2016, ICML.

[41]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[42]  Aaron C. Courville,et al.  Discriminative Regularization for Generative Models , 2016, ArXiv.

[43]  Omer Levy,et al.  Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .

[44]  Sashank J. Reddi,et al.  On the Convergence of Adam and Beyond , 2018, ICLR.