论文信息 - Musings on Deep Learning: Properties of SGD - 字舞流文

Musings on Deep Learning: Properties of SGD

Noah Golowich | Karthik Sridharan | Tomaso Poggio | Brando Miranda | Qianli Liao | Alexander Rakhlin | Chiyuan Zhang | T. Poggio | Chiyuan Zhang | Karthik Sridharan | A. Rakhlin | B. Miranda | Q. Liao | Noah Golowich

[1] Matus Telgarsky,et al. Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[2] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[3] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[4] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.

[5] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[6] Lorenzo Rosasco,et al. Learning with Incremental Iterative Regularization , 2014, NIPS.

[7] Shie Mannor,et al. Robustness and generalization , 2010, Machine Learning.

[8] Ambuj Tewari,et al. Smoothness, Low Noise and Fast Rates , 2010, NIPS.

[9] Shie Mannor,et al. Robustness and Regularization of Support Vector Machines , 2008, J. Mach. Learn. Res..

[10] Dimitri P. Bertsekas,et al. Neuro-Dynamic Programming , 2009, Encyclopedia of Optimization.

[11] Sayan Mukherjee,et al. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..

[12] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[13] V. Koltchinskii,et al. Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[14] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[15] John N. Tsitsiklis,et al. Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..

[16] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .

[17] T. Poggio,et al. Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[18] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[19] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[20] S. Mitter,et al. Metropolis-type annealing algorithms for global optimization in R d , 1993 .

[21] S. Mitter,et al. Recursive stochastic algorithms for global optimization in R d , 1991 .

[22] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[23] Sanguthevar Rajasekaran,et al. On the Convergence Time of Simulated Annealing , 1990 .

[24] B. Gidas. Global optimization via the Langevin equation , 1985, 1985 24th IEEE Conference on Decision and Control.

[25] D. Hubel,et al. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.