A case where a spindly two-layer linear network decisively outperforms any neural network with a fully connected input layer
暂无分享,去创建一个
[1] T. Tao. Topics in Random Matrix Theory , 2012 .
[2] Manfred K. Warmuth,et al. The limits of squared Euclidean distance regularization , 2014, NIPS.
[3] Nathan Srebro,et al. Approximate is Good Enough: Probabilistic Variants of Dimensional and Margin Complexity , 2020, COLT 2020.
[4] S. V. N. Vishwanathan,et al. Leaving the Span , 2005, COLT.
[5] M. Meckes. Concentration of norms and eigenvalues of random matrices , 2002, math/0211192.
[6] Ohad Shamir,et al. Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks , 2016, ICML.
[7] Nathan Srebro,et al. Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).
[8] Manfred K. Warmuth,et al. Winnowing with Gradient Descent , 2020, COLT.
[9] Manfred K. Warmuth,et al. Reparameterizing Mirror Descent as Gradient Descent , 2020, NeurIPS.
[10] Rocco A. Servedio,et al. Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms , 2001, NIPS.
[11] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..
[12] Prashant Nalini Vasudevan,et al. XOR Codes and Sparse Learning Parity with Noise , 2019, SODA.
[13] Matus Telgarsky,et al. Benefits of Depth in Neural Networks , 2016, COLT.