Online Limited Memory Neural-Linear Bandits with Likelihood Matching
暂无分享,去创建一个
[1] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[2] 俊一 甘利. 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .
[3] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[4] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[5] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[6] Anton van den Hengel,et al. Semidefinite Programming , 2014, Computer Vision, A Reference Guide.
[7] Tianbao Yang,et al. Efficient Low-Rank Stochastic Gradient Descent Methods for Solving Semidefinite Programs , 2014, AISTATS.
[8] Shimon Whiteson,et al. VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning , 2020, ICLR.
[9] Quanquan Gu,et al. Neural Thompson Sampling , 2020, ICLR.
[10] Nahum Shimkin,et al. Deep Randomized Least Squares Value Iteration , 2019 .
[11] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[12] Jasper Snoek,et al. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.
[13] Kamyar Azizzadenesheli,et al. Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).
[14] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[15] Pedro M. Domingos,et al. Every Model Learned by Gradient Descent Is Approximately a Kernel Machine , 2020, ArXiv.
[16] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[17] Handong Zhao,et al. Neural Contextual Bandits with Deep Representation and Shallow Exploration , 2020, ICLR.
[18] Ye Zhang,et al. A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification , 2015, IJCNLP.
[19] Shie Mannor,et al. Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning , 2018, NeurIPS.
[20] Quanquan Gu,et al. Neural Contextual Bandits with UCB-based Exploration , 2019, ICML.
[21] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[22] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[23] Max Welling,et al. Distributed Inference for Latent Dirichlet Allocation , 2007, NIPS.
[24] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[25] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[26] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[27] Shie Mannor,et al. Is a picture worth a thousand words? A Deep Multi-Modal Fusion Architecture for Product Classification in e-commerce , 2016, AAAI 2016.
[28] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[29] Shie Mannor,et al. Shallow Updates for Deep Reinforcement Learning , 2017, NIPS.
[30] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[31] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[32] Ian Osband,et al. The Uncertainty Bellman Equation and Exploration , 2017, ICML.