Understanding Deep Architectures using a Recursive Convolutional Network

A key challenge in designing convolutional network models is sizing them appropriately. Many factors are involved in these decisions, including number of layers, feature maps, kernel sizes, etc. Complicating this further is the fact that each of these influence not only the numbers and dimensions of the activation units, but also the total number of parameters. In this paper we focus on assessing the independent contributions of three of these linked variables: The numbers of layers, feature maps, and parameters. To accomplish this, we employ a recursive convolutional network whose weights are tied between layers; this allows us to vary each of the three factors in a controlled setting. We find that while increasing the numbers of layers and parameters each have clear benefit, the number of feature maps (and hence dimensionality of the representation) appears ancillary, and finds most of its benefit through the introduction of more weights. Our results (i) empirically confirm the notion that adding layers alone increases computational power, within the context of convolutional layers, and (ii) suggest that precise sizing of convolutional feature map dimensions is itself of little concern; more attention should be paid to the number of parameters in these layers instead.

[1]  Flexible , 1976 .

[2]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5]  Jürgen Schmidhuber,et al.  Training Recurrent Networks by Evolino , 2007, Neural Computation.

[6]  Richard G. Baraniuk,et al.  Sparse Coding via Thresholding and Local Competition in Neural Circuits , 2008, Neural Computation.

[7]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[8]  J. Schmidhuber,et al.  A Novel Connectionist System for Unconstrained Handwriting Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[10]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[11]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[12]  Geoffrey E. Hinton,et al.  Temporal-Kernel Recurrent Neural Networks , 2010, Neural Networks.

[13]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[14]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[15]  Luca Maria Gambardella,et al.  Flexible, High Performance Convolutional Neural Networks for Image Classification , 2011, IJCAI.

[16]  Andrew Y. Ng,et al.  The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[17]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[18]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[19]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[20]  Yann LeCun,et al.  Convolutional neural networks applied to house numbers digit classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[21]  Andrew Y. Ng,et al.  Convolutional-Recursive Deep Learning for 3D Object Classification , 2012, NIPS.

[22]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[23]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Yoshua Bengio,et al.  Multi-Prediction Deep Boltzmann Machines , 2013, NIPS.

[25]  Yann LeCun,et al.  Discriminative Recurrent Sparse Auto-Encoders , 2013, ICLR.

[26]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[27]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[28]  R. Fergus,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.