论文信息 - Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review

Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review

The paper reviews and extends an emerging body of theoretical results on deep learning including the conditions under which it can be exponentially better than shallow learning. A class of deep convolutional networks represent an important special case of these conditions, though weight sharing is not the main reason for their exponential advantage. Implications of a few key theorems are discussed, together with new results, open problems and conjectures.

[1] H. Wold,et al. Some Theorems on Distribution Functions , 1936 .

[2] E. Corominas,et al. Condiciones para que una función infinitamente derivable sea un polinomio , 1954 .

[3] Marvin Minsky,et al. Perceptrons: An Introduction to Computational Geometry , 1969 .

[4] J. Håstad. Computational limitations of small-depth circuits , 1987 .

[5] Tomaso A. Poggio,et al. Representation properties of multilayer feedforward networks , 1988, Neural Networks.

[6] Tomaso A. Poggio,et al. Representation Properties of Networks: Kolmogorov's Theorem Is Irrelevant , 1989, Neural Computation.

[7] Noam Nisan,et al. Constant depth circuits, Fourier transform, and learnability , 1989, 30th Annual Symposium on Foundations of Computer Science.

[8] R. DeVore,et al. Optimal nonlinear approximation , 1989 .

[9] Tomaso A. Poggio,et al. Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[10] Hrushikesh Narhar Mhaskar,et al. Approximation properties of a multilayered feedforward artificial neural network , 1993, Adv. Comput. Math..

[11] H. Mhaskar. Neural networks for localized approximation of real functions , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[12] Yishay Mansour,et al. Learning Boolean Functions via the Fourier Transform , 1994 .

[13] H. Mhaskar,et al. Neural networks for localized approximation , 1994 .

[14] Tomaso A. Poggio,et al. Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[15] H. N. Mhaskar,et al. Neural Networks for Optimal Approximation of Smooth and Analytic Functions , 1996, Neural Computation.

[16] Xin Li,et al. Limitations of the approximation capabilities of neural networks with one hidden layer , 1996, Adv. Comput. Math..

[17] Daniel L. Ruderman,et al. Origins of scaling in natural images , 1996, Vision Research.

[18] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .

[19] Allan Pinkus,et al. Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[20] T. Poggio,et al. Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[21] Gregory Piatetsky-Shapiro,et al. High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[22] 장윤희,et al. Y. , 2003, Industrial and Labor Relations Terms.

[23] Hrushikesh Narhar Mhaskar,et al. On the tractability of multivariate integration and approximation by neural networks , 2004, J. Complex..

[24] T. Poggio,et al. Networks and the best approximation property , 1990, Biological Cybernetics.

[25] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[26] T. Poggio,et al. On optimal nonlinear associative recall , 1975, Biological Cybernetics.

[27] T. Poggio,et al. On the representation of multi-input systems: Computational properties of polynomial algorithms , 1980, Biological Cybernetics.

[28] T. Poggio,et al. The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[29] Andreas Maurer,et al. Bounds for Linear Multi-Task Learning , 2006, J. Mach. Learn. Res..

[30] Yoshua Bengio,et al. Scaling learning algorithms towards AI , 2007 .

[31] Lars Grasedyck,et al. Hierarchical Singular Value Decomposition of Tensors , 2010, SIAM J. Matrix Anal. Appl..

[32] Yoshua Bengio,et al. Shallow vs. Deep Sum-Product Networks , 2011, NIPS.

[33] Stefano Soatto,et al. Steps Towards a Theory of Visual Information: Active Perception, Signal-to-Symbol Conversion and the Interplay Between Sensing and Control , 2011, ArXiv.

[34] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[35] Roi Livni,et al. A Provably Efficient Algorithm for Training Deep Networks , 2013, ArXiv.

[36] Joel Z. Leibo,et al. Unsupervised Learning of Invariant Representations in Hierarchical Architectures , 2013, ArXiv.

[37] Ha Hong,et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[38] Tomaso Poggio,et al. Unsupervised learning of invariant representations with low sample complexity: the magic of sensory cortex or a new framework for machine learning? , 2013, 1311.4158.

[39] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[40] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .

[41] Tomaso Poggio,et al. Notes on Hierarchical Splines, DCLNs and i-theory , 2015 .

[42] Nadav Cohen,et al. On the Expressive Power of Deep Learning: A Tensor Analysis , 2015, COLT 2016.

[43] Lorenzo Rosasco,et al. Deep Convolutional Networks are Hierarchical Kernel Machines , 2015, ArXiv.

[44] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.

[45] Tomaso Poggio,et al. I-theory on depth vs width: hierarchical function composition , 2015 .

[46] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[47] Matus Telgarsky,et al. Representation Benefits of Deep Feedforward Networks , 2015, ArXiv.

[48] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[49] Fabio Anselmi,et al. Visual Cortex and Deep Networks: Learning Invariant Representations , 2016 .

[50] Ohad Shamir,et al. Depth Separation in ReLU Networks for Approximating Smooth Non-Linear Functions , 2016, ArXiv.

[51] David J. Schwab,et al. Comment on "Why does deep and cheap learning work so well?" [arXiv: 1608.08225] , 2016, ArXiv.

[52] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[53] T. Poggio,et al. Deep vs. shallow networks : An approximation theory perspective , 2016, ArXiv.

[54] Ohad Shamir,et al. The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[55] Tomaso A. Poggio,et al. Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex , 2016, ArXiv.

[56] Max Tegmark,et al. Why Does Deep and Cheap Learning Work So Well? , 2016, Journal of Statistical Physics.

[57] Lorenzo Rosasco,et al. Unsupervised learning of invariant representations , 2016, Theor. Comput. Sci..

[58] Tomaso A. Poggio,et al. Learning Real and Boolean Functions: When Is Deep Better Than Shallow , 2016, ArXiv.

[59] Ohad Shamir,et al. Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks , 2016, ICML.

[60] Francis R. Bach,et al. Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..