Reparameterizing Mirror Descent as Gradient Descent
暂无分享,去创建一个
[1] Babak Hassibi,et al. The p-norm generalization of the LMS algorithm for adaptive filtering , 2003, IEEE Transactions on Signal Processing.
[2] R. Rockafellar. Monotone Operators and the Proximal Point Algorithm , 1976 .
[3] Nathan Srebro,et al. Implicit Bias of Gradient Descent on Linear Convolutional Networks , 2018, NeurIPS.
[4] Kathrin Abendroth,et al. The Geometry Of Population Genetics , 2016 .
[5] Maxim Raginsky,et al. Continuous-time stochastic Mirror Descent on a network: Variance reduction, consensus, convergence , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[6] Sayan Mukherjee,et al. The Information Geometry of Mirror Descent , 2013, IEEE Transactions on Information Theory.
[7] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..
[8] Manfred K. Warmuth,et al. Robust Bi-Tempered Logistic Loss Based on Bregman Divergences , 2019, NeurIPS.
[9] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .
[10] W. L. Burke. Applied Differential Geometry , 1985 .
[11] Manfred K. Warmuth,et al. Winnowing with Gradient Descent , 2020, COLT.
[12] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[13] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[14] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[15] Wojciech Kotlowski,et al. A case where a spindly two-layer linear network whips any neural network with a fully connected input layer , 2020, ArXiv.
[16] Yishay Mansour,et al. Improved second-order bounds for prediction with expert advice , 2006, Machine Learning.
[17] Andrzej Cichocki,et al. Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.
[18] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..
[19] J. Naudts. Deformed exponentials and logarithms in generalized thermostatistics , 2002, cond-mat/0203489.
[20] Manfred K. Warmuth,et al. The Perceptron Algorithm Versus Winnow: Linear Versus Logarithmic Mistake Bounds when Few Input Variables are Relevant (Technical Note) , 1997, Artif. Intell..
[21] S. V. N. Vishwanathan,et al. Leaving the Span , 2005, COLT.
[22] Manfred K. Warmuth,et al. The perceptron algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant , 1995, COLT '95.
[23] Jiazhong Nie,et al. Online PCA with Optimal Regret , 2016, J. Mach. Learn. Res..
[24] William H. Sandholm,et al. Population Games And Evolutionary Dynamics , 2010, Economic learning and social evolution.
[25] Yoram Singer,et al. Exponentiated Gradient Meets Gradient Descent , 2019, 1902.01903.
[26] Nathan Srebro,et al. Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).
[27] Varun Kanade,et al. Implicit Regularization for Optimal Sparse Recovery , 2019, NeurIPS.