Gradient Descent with Identity Initialization Efficiently Learns Positive-Definite Linear Transformations by Deep Residual Networks
暂无分享,去创建一个
[1] W. Culver. On the existence and uniqueness of the real logarithm of a matrix , 1966 .
[2] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.
[3] Charles R. Johnson,et al. Topics in Matrix Analysis , 1991 .
[4] Peter L. Bartlett,et al. Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.
[5] D. Harville. Matrix Algebra From a Statistician's Perspective , 1998 .
[6] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[7] Charles R. Johnson,et al. Matrix Analysis, 2nd Ed , 2012 .
[8] Michael Unser,et al. Hessian Schatten-Norm Regularization for Linear Inverse Problems , 2012, IEEE Transactions on Image Processing.
[9] Aditya Bhaskara,et al. Provable Bounds for Learning Some Deep Representations , 2013, ICML.
[10] Roi Livni,et al. On the Computational Efficiency of Training Neural Networks , 2014, NIPS.
[11] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[12] Alexandr Andoni,et al. Learning Polynomials with Neural Networks , 2014, ICML.
[13] Yuchen Zhang,et al. L1-regularized Neural Networks are Improperly Learnable in Polynomial Time , 2015, ICML.
[14] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[15] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Ohad Shamir,et al. On the Quality of the Initial Basin in Overspecified Neural Networks , 2015, ICML.
[17] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[18] Yoram Singer,et al. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.
[19] Yi Zheng,et al. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.
[20] Yuanzhi Li,et al. Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.
[21] Anima Anandkumar,et al. Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods , 2017 .
[22] Amit Daniely,et al. SGD Learns the Conjugate Kernel Class of the Network , 2017, NIPS.
[23] Prateek Jain,et al. Global Convergence of Non-Convex Gradient Descent for Computing Matrix Squareroot , 2015, AISTATS.
[24] Derong Liu,et al. Neural Information Processing , 2017, Lecture Notes in Computer Science.
[25] Rina Panigrahy,et al. Electron-Proton Dynamics in Deep Learning , 2017, ArXiv.
[26] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[27] Martin J. Wainwright,et al. On the Learnability of Fully-Connected Neural Networks , 2017, AISTATS.
[28] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.
[29] Amirhossein Taghvaei,et al. How regularization affects the critical points in linear networks , 2017, NIPS.
[30] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[31] Matthias Hein,et al. The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.
[32] Tengyu Ma,et al. Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.
[33] Xaq Pitkow,et al. Skip Connections Eliminate Singularities , 2017, ICLR.
[34] Philip M. Long,et al. Representing smooth functions as compositions of near-identity functions with implications for deep network optimization , 2018, ArXiv.
[35] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[36] Philip M. Long,et al. Gradient descent efficiently learns positive definite deep linear residual networks , 2018 .
[37] Ohad Shamir,et al. Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks , 2018, COLT.