Transport Analysis of Infinitely Deep Neural Network
暂无分享,去创建一个
[1] Nicolas Le Roux,et al. Convex Neural Networks , 2005, NIPS.
[2] David Duvenaud,et al. Neural Ordinary Differential Equations , 2018, NeurIPS.
[3] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[4] Nadav Cohen,et al. On the Expressive Power of Deep Learning: A Tensor Analysis , 2015, COLT 2016.
[5] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Lorenzo Rosasco,et al. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.
[7] Noboru Murata,et al. An Integral Representation of Functions Using Three-layered Networks and Their Approximation Bounds , 1996, Neural Networks.
[8] Behnam Neyshabur,et al. Implicit Regularization in Deep Learning , 2017, ArXiv.
[9] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[10] Surya Ganguli,et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.
[11] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[12] Geoffrey E. Hinton. Connectionist Learning Procedures , 1989, Artif. Intell..
[13] Pascal Vincent,et al. Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.
[14] Raquel Urtasun,et al. The Reversible Residual Network: Backpropagation Without Storing Activations , 2017, NIPS.
[15] Pascal Vincent,et al. Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.
[16] Andrew R. Barron,et al. Minimax lower bounds for ridge combinations including neural nets , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).
[17] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[18] Pascal Vincent,et al. A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.
[19] Yoav Freund,et al. Boosting: Foundations and Algorithms , 2012 .
[20] Allan Pinkus,et al. Density in Approximation Theory , 2005 .
[21] Yale Chang,et al. Unsupervised Feature Learning via Sparse Hierarchical Representations [ 1 ] , 2014 .
[22] Ming Yang,et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[23] Feng Liang,et al. Improved minimax predictive densities under Kullback-Leibler loss , 2006 .
[24] Andrew R. Barron,et al. Approximation by Combinations of ReLU and Squared ReLU Ridge Functions With $\ell^1$ and $\ell^0$ Controls , 2016, IEEE Transactions on Information Theory.
[25] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.
[26] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.
[27] Max Tegmark,et al. Why Does Deep and Cheap Learning Work So Well? , 2016, Journal of Statistical Physics.
[28] Noboru Murata,et al. Transportation analysis of denoising autoencoders: a novel method for analyzing deep neural networks , 2017, ArXiv.
[29] Francis R. Bach,et al. Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..
[30] Francis R. Bach,et al. On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..
[31] Noboru Murata,et al. Integral representation of shallow neural network that attains the global minimum. , 2018, 1805.07517.
[32] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.
[33] Taiji Suzuki,et al. Functional Gradient Boosting based on Residual Network Perception , 2018, ICML.
[34] Kenji Fukumizu,et al. Deep Neural Networks Learn Non-Smooth Functions Effectively , 2018, AISTATS.
[35] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[36] Yoshua Bengio,et al. Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.
[37] Lawrence Carin,et al. Policy Optimization as Wasserstein Gradient Flows , 2018, ICML.
[38] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.
[39] Geoffrey E. Hinton,et al. Deep Boltzmann Machines , 2009, AISTATS.
[40] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[41] Vra Krková. Complexity estimates based on integral transforms induced by computational units , 2012 .
[42] Ohad Shamir,et al. The Power of Depth for Feedforward Neural Networks , 2015, COLT.
[43] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[44] Matthias Hein,et al. The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.
[45] N. Murata,et al. Double Continuum Limit of Deep Neural Networks , 2017 .
[46] Richard G. Baraniuk,et al. A Probabilistic Theory of Deep Learning , 2015, ArXiv.
[47] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..
[48] Dilin Wang,et al. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.
[49] Lorenzo Rosasco,et al. Unsupervised learning of invariant representations , 2016, Theor. Comput. Sci..
[50] Kaare Brandt Petersen,et al. The Matrix Cookbook , 2006 .
[51] L. Ambrosio,et al. Gradient Flows: In Metric Spaces and in the Space of Probability Measures , 2005 .
[52] Max Welling,et al. Semi-supervised Learning with Deep Generative Models , 2014, NIPS.
[53] Qianxiao Li,et al. An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks , 2018, ICML.
[54] Pascal Vincent,et al. GSNs : Generative Stochastic Networks , 2015, ArXiv.
[55] Eldad Haber,et al. Stable architectures for deep neural networks , 2017, ArXiv.
[56] Geoffrey E. Hinton,et al. Bayesian Learning for Neural Networks , 1995 .
[57] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[58] Lorenzo Rosasco,et al. On Invariance in Hierarchical Models , 2009, NIPS.
[59] Asuka Takatsu. Wasserstein geometry of Gaussian measures , 2011 .
[60] Daniel Soudry,et al. No bad local minima: Data independent training error guarantees for multilayer neural networks , 2016, ArXiv.
[61] Thomas Hofmann,et al. Greedy Layer-Wise Training of Deep Networks , 2007 .
[62] Matus Telgarsky,et al. Benefits of Depth in Neural Networks , 2016, COLT.
[63] Noboru Murata,et al. Neural Network with Unbounded Activation Functions is Universal Approximator , 2015, 1505.03654.
[64] Dmitry Yarotsky,et al. Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.
[65] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[66] L. Evans. Measure theory and fine properties of functions , 1992 .
[67] C. Villani. Optimal Transport: Old and New , 2008 .
[68] Léon Bottou,et al. Wasserstein Generative Adversarial Networks , 2017, ICML.
[69] Grace Wahba,et al. Spline Models for Observational Data , 1990 .
[70] Y. Brenier. Polar Factorization and Monotone Rearrangement of Vector-Valued Functions , 1991 .
[71] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[72] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.
[73] Shakir Mohamed,et al. Variational Inference with Normalizing Flows , 2015, ICML.
[74] C. Villani,et al. Generalization of an Inequality by Talagrand and Links with the Logarithmic Sobolev Inequality , 2000 .
[75] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2006 .
[76] Johannes Schmidt-Hieber,et al. Nonparametric regression using deep neural networks with ReLU activation function , 2017, The Annals of Statistics.
[77] Sanjeev Arora,et al. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.
[78] Stéphane Mallat,et al. Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.
[79] Yoshua Bengio,et al. What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..
[80] Bin Dong,et al. Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations , 2017, ICML.
[81] Peter Glöckner,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2013 .
[82] Taiji Suzuki,et al. Fast generalization error bound of deep learning from a kernel perspective , 2018, AISTATS.
[83] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[84] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.
[85] Noboru Murata,et al. Sampling Hidden Parameters from Oracle Distribution , 2014, ICANN.