EXTENDING NETWORK NORMALIZATION SCHEMES

Normalization techniques have only recently begun to be exploited in supervised learning tasks. Batch normalization exploits mini-batch statistics to normalize the activations. This was shown to speed up training and result in better models. However its success has been very limited when dealing with recurrent neural networks. On the other hand, layer normalization normalizes the activations across all activities within a layer. This was shown to work well in the recurrent setting. In this paper we propose a unified view of normalization techniques, as forms of divisive normalization, which includes layer and batch normalization as special cases. Our second contribution is the finding that a small modification to these normalization schemes, in conjunction with a sparse regularizer on the activations, leads to significant benefits over standard normalization techniques. We demonstrate the effectiveness of our unified divisive normalization framework in the context of convolutional neural nets and recurrent neural networks, showing improvements over baselines in image classification, language modeling as well as super-resolution.

[1]  Danilo Comminiello,et al.  Group sparse regularization for deep neural networks , 2016, Neurocomputing.

[2]  Tomaso A. Poggio,et al.  Streaming Normalization: Towards Simpler and More Biologically-plausible Normalizations for Online and Recurrent Learning , 2016, ArXiv.

[3]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[4]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[6]  Tomaso A. Poggio,et al.  Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex , 2016, ArXiv.

[7]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ross B. Girshick,et al.  Reducing Overfitting in Deep Networks by Decorrelating Representations , 2015, ICLR.

[10]  Valero Laparra,et al.  Density Modeling of Images using a Generalized Normalization Transformation , 2015, ICLR.

[11]  Ying Zhang,et al.  Batch normalized recurrent neural networks , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Xiaoou Tang,et al.  Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Renjie Liao,et al.  Learning Deep Parsimonious Representations , 2016, NIPS.

[14]  O. Schwartz,et al.  Flexible Gating of Contextual Influences in Natural Vision , 2015, Nature Neuroscience.

[15]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[17]  Alexander S. Ecker,et al.  Population code in mouse V1 facilitates read-out of natural scenes through increased sparseness , 2014, Nature Neuroscience.

[18]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[19]  Luc Van Gool,et al.  Anchored Neighborhood Regression for Fast Example-Based Super-Resolution , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  M. Carandini,et al.  Normalization as a canonical neural computation , 2013, Nature Reviews Neuroscience.

[21]  Matthias Bethge,et al.  Temporal Adaptation Enhances Efficient Contrast Gain Control on Natural Images , 2012, PLoS Comput. Biol..

[22]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Aline Roumy,et al.  Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding , 2012, BMVC.

[24]  Yann LeCun,et al.  Convolutional neural networks applied to house numbers digit classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[25]  A. Pouget,et al.  Marginalization in Neural Circuits with Divisive Normalization , 2011, The Journal of Neuroscience.

[26]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[27]  D. Ringach Population coding under normalization , 2010, Vision Research.

[28]  Michael Elad,et al.  On Single Image Scale-Up Using Sparse-Representations , 2010, Curves and Surfaces.

[29]  Shawn R. Olsen,et al.  Divisive Normalization in Olfactory Population Codes , 2010, Neuron.

[30]  Alex R. Wade,et al.  Representation of Concurrent Stimuli by Population Activity in Visual Cortex , 2009, Neuron.

[31]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[32]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  R. Fergus,et al.  Learning invariant features through topographic filter maps , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  P. Dayan,et al.  Perceptual organization in the tilt illusion. , 2009, Journal of vision.

[35]  D. Heeger,et al.  The Normalization Model of Attention , 2009, Neuron.

[36]  Matthias Bethge,et al.  The Conjoint Effect of Divisive Normalization and Orientation Selectivity on Redundancy Reduction , 2008, NIPS.

[37]  Eero P. Simoncelli,et al.  Reducing statistical dependencies in natural signals using radial Gaussianization , 2008, NIPS.

[38]  Nicolas Pinto,et al.  Why is Real-World Visual Object Recognition Hard? , 2008, PLoS Comput. Biol..

[39]  Eero P. Simoncelli,et al.  Nonlinear image representation for efficient perceptual coding , 2006, IEEE Transactions on Image Processing.

[40]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[41]  Eero P. Simoncelli,et al.  Natural signal statistics and sensory gain control , 2001, Nature Neuroscience.

[42]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[43]  Eero P. Simoncelli,et al.  A model of neuronal responses in visual area MT , 1998, Vision Research.

[44]  D. Heeger Normalization of cell responses in cat striate cortex , 1992, Visual Neuroscience.

[45]  A. B. Bonds Role of Inhibition in the Specification of Orientation Selectivity of Cells in the Cat Striate Cortex , 1989, Visual Neuroscience.