Rectified Factor Networks and Dropout

The success of deep learning techniques is based on their robust, effective and abstract representations of the input. In particular, sparse representations that are obtained from rectified linear units and dropout increased classification performance at various tasks. Deep architectures are often constructed by unsupervised pretraining and stacking of either restricted Boltzmann machines (RBMs) or au-toencoders. We propose rectified factor networks (RFNs) for pretraining of deep networks. In contrast to RBMs and autoencoders, RFNs (1) estimate the noise of each input component, (2) aim at decorrelating the hidden units (factors), (3) estimate the precision of hidden units by the posterior variance. In the E-step of an EM algorithm, RFN learning (i) enforces non-negative posterior means, (ii) allows dropout of hidden units, and (iii) normalizes the signal part of the hidden units. In the M-step, RFN learning applies gradient descent along the Newton direction to allow rectifying, dropout, and fast GPU implementations. RFN learning can be considered as a variational EM algorithm with unknown prior which is estimated during maximizing the likelihood. Using a fixed point analysis, we show RFNs explain the data variance like factor analysis. RFNs produce sparse and non-linear input representations for new data by a linear mapping and subsequent rectification, therefore can be readily used for pretraining of deep networks. It is tailored to making full use of large hidden layers with respect to both using all of them to code the input and computational complexity. We tested and compared RFNs for unsupervised pretraining of deep learning on nine different benchmark datasets: MNIST, basic MNIST, bg-rand MNIST, bg-img MNIST, rect (tall vs. wide rectangles), rect-img

[1]  Pierre Baldi,et al.  The dropout learning algorithm , 2014, Artif. Intell..

[2]  Yoshua Bengio,et al.  An empirical analysis of dropout in piecewise linear networks , 2013, ICLR.

[3]  S. Hochreiter,et al.  HapFABIA: Identification of very short segments of identity by descent characterized by rare variants in large sequencing data , 2013, Nucleic acids research.

[4]  Geoffrey E. Hinton,et al.  On rectified linear units for speech processing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Geoffrey E. Hinton,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[8]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[12]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[13]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[14]  Ulrich Bodenhofer,et al.  FABIA: factor analysis for bicluster acquisition , 2010, Bioinform..

[15]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[16]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[17]  Ben Taskar,et al.  Expectation Maximization and Posterior Constraints , 2007, NIPS.

[18]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[19]  Markus Harva,et al.  Variational learning for rectified factor analysis , 2007, Signal Process..

[20]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[21]  A. Kabán,et al.  A variational Bayesian method for rectified factor analysis , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[22]  Bhaskar D. Rao,et al.  Variational EM Algorithms for Non-Gaussian Latent Variable Models , 2005, NIPS.

[23]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[24]  Mark A. Girolami,et al.  A Variational Method for Learning Sparse and Overcomplete Representations , 2001, Neural Computation.

[25]  Geoffrey E. Hinton,et al.  Variational Learning for Switching State-Space Models , 2000, Neural Computation.

[26]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[27]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[28]  Geoffrey E. Hinton,et al.  Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[29]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[30]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[31]  Geoffrey E. Hinton,et al.  Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks , 2006 .

[32]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .