暂无分享,去创建一个
[1] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[2] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.
[3] Olivier Capp'e,et al. Algorithms for Non-Stationary Generalized Linear Bandits , 2020, ArXiv.
[4] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[5] Djallel Bouneffouf,et al. A Survey on Practical Applications of Multi-Armed and Contextual Bandits , 2019, ArXiv.
[6] Alessandro Lazaric,et al. Linear Thompson Sampling Revisited , 2016, AISTATS.
[7] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[8] Olivier Cappé,et al. Weighted Linear Bandits for Non-Stationary Environments , 2019, NeurIPS.
[9] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.
[10] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[12] Guilherme A. Barreto,et al. Short-term memory mechanisms in neural network learning of robot navigation tasks: A case study , 2009, 2009 6th Latin American Robotics Symposium (LARS 2009).
[13] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[14] David Simchi-Levi,et al. Learning to Optimize under Non-Stationarity , 2018, AISTATS.
[15] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .
[16] Eric Moulines,et al. On Upper-Confidence Bound Policies for Switching Bandit Problems , 2011, ALT.
[17] Jasper Snoek,et al. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.
[18] Zheng Wen,et al. Nearly Optimal Adaptive Procedure with Change Detection for Piecewise-Stationary Bandit , 2018, AISTATS.
[19] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[20] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[21] Fang Liu,et al. A Change-Detection based Framework for Piecewise-stationary Multi-Armed Bandit Problem , 2017, AAAI.