Benign overfitting in linear regression
暂无分享,去创建一个
Philip M. Long | Peter L. Bartlett | Alexander Tsigler | G'abor Lugosi | P. Bartlett | G. Lugosi | Alexander Tsigler
[1] C. Desoer,et al. A Note on Pseudoinverses , 1963 .
[2] Gene H. Golub,et al. Matrix computations , 1983 .
[3] H. Balsters,et al. Learnability with respect to fixed distributions , 1991 .
[4] Peter L. Bartlett,et al. Learning with a slowly changing distribution , 1992, COLT '92.
[5] Philip M. Long,et al. Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.
[6] Peter L. Bartlett,et al. Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.
[7] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.
[8] L. Devroye,et al. The Hilbert Kernel Regression Estimate , 1998 .
[9] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[10] Robert Tibshirani,et al. The Elements of Statistical Learning , 2001 .
[11] Nicolas Le Roux,et al. Convex Neural Networks , 2005, NIPS.
[12] Mark Rudelson,et al. Sampling from large matrices: An approach through geometric functional analysis , 2005, JACM.
[13] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.
[14] M. Rudelson,et al. Hanson-Wright inequality and sub-gaussian concentration , 2013 .
[15] V. Koltchinskii,et al. Concentration inequalities and moment bounds for sample covariance operators , 2014, 1405.2468.
[16] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.
[17] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[18] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[19] Francis R. Bach,et al. Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression , 2016, J. Mach. Learn. Res..
[20] Le Song,et al. Diverse Neural Network Learns True Target Functions , 2016, AISTATS.
[21] Francis R. Bach,et al. Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..
[22] Mikhail Belkin,et al. To understand deep learning we need to understand kernel learning , 2018, ICML.
[23] Ohad Shamir,et al. Size-Independent Sample Complexity of Neural Networks , 2017, COLT.
[24] Tengyuan Liang,et al. Just Interpolate: Kernel "Ridgeless" Regression Can Generalize , 2018, The Annals of Statistics.
[25] Mikhail Belkin,et al. Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate , 2018, NeurIPS.
[26] Mikhail Belkin,et al. Approximation beats concentration? An approximation view on inference with smooth radial kernels , 2018, COLT.
[27] Mikhail Belkin,et al. Reconciling modern machine learning and the bias-variance trade-off , 2018, ArXiv.
[28] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[29] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[30] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[31] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[32] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[33] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[34] Anant Sahai,et al. Harmless interpolation of noisy data in regression , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).
[35] Steffen Grünewälder,et al. Ivanov-Regularised Least-Squares Estimators over Large RKHSs and Their Interpolation Spaces , 2019, J. Mach. Learn. Res..
[36] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[37] Mikhail Belkin,et al. Does data interpolation contradict statistical optimality? , 2018, AISTATS.
[38] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[39] Yuan Cao,et al. Towards Understanding the Spectral Bias of Deep Learning , 2019, IJCAI.
[40] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.