Training Neural Networks with Implicit Variance

We present a novel method to train predictive Gaussian distributions pz|x for regression problems with neural networks. While most approaches either ignore or explicitly model the variance as another response variable, it is trained implicitly in our case. Establishing stochasticty by the injection of noise into the input and hidden units, the outputs are approximated with a Gaussian distribution by the forward propagation method introduced for fast dropout [1]. We have designed our method to respect that probabilistic interpretation of the output units in the loss function. The method is evaluated on a synthetic and a inverse robot dynamics task, yielding superior performance to plain neural networks, Gaussian processes and LWPR in terms of likelihood.

[1]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[2]  Stefan Schaal,et al.  Incremental Online Learning in High Dimensions , 2005, Neural Computation.

[3]  Geoffrey E. Hinton,et al.  Training Recurrent Neural Networks , 2013 .

[4]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[5]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[8]  Steve Renals,et al.  Deep Architectures for Articulatory Inversion , 2012, INTERSPEECH.

[9]  Jeffrey K. Uhlmann,et al.  New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[10]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[11]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[12]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons , 2013, ArXiv.

[13]  Geoffrey E. Hinton,et al.  Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes , 2007, NIPS.

[14]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[15]  C. Bishop Mixture density networks , 1994 .

[16]  Ruslan Salakhutdinov,et al.  Learning Stochastic Feedforward Neural Networks , 2013, NIPS.

[17]  R. Salakhutdinov,et al.  A New Learning Algorithm for Stochastic Feedforward Neural Nets , 2013 .

[18]  Christopher D. Manning,et al.  Fast dropout training , 2013, ICML.

[19]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[20]  Alexander J. Smola,et al.  Heteroscedastic Gaussian process regression , 2005, ICML.

[21]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[22]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[23]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[24]  Grgoire Montavon,et al.  Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.

[25]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[26]  Y. L. Cun Learning Process in an Asymmetric Threshold Network , 1986 .

[27]  Françoise Fogelman-Soulié,et al.  Disordered Systems and Biological Organization , 1986, NATO ASI Series.

[28]  Geoffrey E. Hinton,et al.  On rectified linear units for speech processing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[30]  Yoshua Bengio,et al.  Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.