Memory Efficient Adaptive Optimization
暂无分享,去创建一个
Yoram Singer | Vineet Gupta | Rohan Anil | Tomer Koren | Y. Singer | Tomer Koren | Vineet Gupta | Rohan Anil
[1] Yoram Singer,et al. Memory-Efficient Adaptive Optimization for Large-Scale Learning , 2019, ArXiv.
[2] Yoram Singer,et al. Shampoo: Preconditioned Stochastic Tensor Optimization , 2018, ICML.
[3] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[4] Moses Charikar,et al. Finding frequent items in data streams , 2004, Theor. Comput. Sci..
[5] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[6] Kunle Olukotun,et al. Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark , 2018, ACM SIGOPS Oper. Syst. Rev..
[7] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[8] Ankur Bapna,et al. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.
[9] Claudio Gentile,et al. Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..
[10] Yi Zhang,et al. The Case for Full-Matrix Adaptive Regularization , 2018, ArXiv.
[11] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[12] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[14] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .
[15] Claudio Gentile,et al. On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.
[16] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[17] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[18] Matthew J. Streeter,et al. Adaptive Bound Optimization for Online Convex Optimization , 2010, COLT 2010.
[19] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[20] Mike Schuster,et al. Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[22] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..
[23] Samy Bengio,et al. Tensor2Tensor for Neural Machine Translation , 2018, AMTA.
[24] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[25] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[26] Alok Aggarwal,et al. Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.
[27] Anastasios Kyrillidis,et al. Compressing Gradient Optimizers via Count-Sketches , 2019, ICML.