Decoding Stacked Denoising Autoencoders

Data representation in a stacked denoising autoencoder is investigated. Decoding is a simple technique for translating a stacked denoising autoencoder into a composition of denoising autoencoders in the ground space. In the infinitesimal limit, a composition of denoising autoencoders is reduced to a continuous denoising autoencoder, which is rich in analytic properties and geometric interpretation. For example, the continuous denoising autoencoder solves the backward heat equation and transports each data point so as to decrease entropy of the data distribution. Together with ridgelet analysis, an integral representation of a stacked denoising autoencoder is derived.

[1]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[2]  Yoshua Bengio,et al.  What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[3]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[4]  C. Villani Optimal Transport: Old and New , 2008 .

[5]  Peter Glöckner,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2013 .

[6]  Behnam Neyshabur,et al.  Implicit Regularization in Deep Learning , 2017, ArXiv.

[7]  Pascal Vincent,et al.  Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.

[8]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[9]  Y. Brenier Polar Factorization and Monotone Rearrangement of Vector-Valued Functions , 1991 .

[10]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[11]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[12]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[13]  Noboru Murata,et al.  Neural Network with Unbounded Activation Functions is Universal Approximator , 2015, 1505.03654.

[14]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[15]  Yoshua Bengio,et al.  Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.

[16]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[17]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[18]  Vra Krková Complexity estimates based on integral transforms induced by computational units , 2012 .

[19]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[21]  Noboru Murata,et al.  Sampling Hidden Parameters from Oracle Distribution , 2014, ICANN.

[22]  Richard G. Baraniuk,et al.  A Probabilistic Theory of Deep Learning , 2015, ArXiv.

[23]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[24]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[25]  S. Mallat,et al.  Invariant Scattering Convolution Networks , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Lorenzo Rosasco,et al.  Unsupervised learning of invariant representations , 2016, Theor. Comput. Sci..

[27]  Lorenzo Rosasco,et al.  On Invariance in Hierarchical Models , 2009, NIPS.

[28]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[29]  Jascha Sohl-Dickstein,et al.  Minimum Probability Flow Learning , 2009, ICML.

[30]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[31]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[32]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[33]  Noboru Murata,et al.  An Integral Representation of Functions Using Three-layered Networks and Their Approximation Bounds , 1996, Neural Networks.

[34]  C. Villani,et al.  Generalization of an Inequality by Talagrand and Links with the Logarithmic Sobolev Inequality , 2000 .

[35]  Feng Liang,et al.  Improved minimax predictive densities under Kullback-Leibler loss , 2006 .

[36]  G. Wahba Spline models for observational data , 1990 .

[37]  L. Evans Measure theory and fine properties of functions , 1992 .

[38]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.