暂无分享,去创建一个
Peter Dayan | David Silver | Arthur Guez | D. Silver | A. Guez | P. Dayan
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] J. J. Martin. Bayesian Decision Problems and Markov Chains , 1967 .
[3] J. Moon. Random walks on random trees , 1973, Journal of the Australian Mathematical Society.
[4] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[5] M. Escobar,et al. Bayesian Density Estimation and Inference Using Mixtures , 1995 .
[6] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[7] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[8] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.
[9] Erik B. Sudderth. Graphical models for visual object recognition and tracking , 2006 .
[10] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[11] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.
[12] Joelle Pineau,et al. Model-Based Bayesian Reinforcement Learning in Large Structured Domains , 2008, UAI.
[13] Finale Doshi-Velez,et al. The Infinite Partially Observable Markov Decision Process , 2009, NIPS.
[14] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.
[15] John Langford,et al. Agnostic active learning , 2006, J. Comput. Syst. Sci..
[16] Alessandro Lazaric,et al. Bayesian Multi-Task Reinforcement Learning , 2010, ICML.
[17] Joshua B. Tenenbaum,et al. Nonparametric Bayesian Policy Priors for Reinforcement Learning , 2010, NIPS.
[18] Peter Stone,et al. Gaussian Processes for Sample Efficient Reinforcement Learning with RMAX-like Exploration , 2010, ECML/PKDD.
[19] Yee Whye Teh,et al. Bayesian Nonparametric Models , 2010, Encyclopedia of Machine Learning.
[20] Yee Whye Teh,et al. Dirichlet Process , 2017, Encyclopedia of Machine Learning and Data Mining.
[21] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[22] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.
[23] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[24] Thomas L. Griffiths,et al. The Indian Buffet Process: An Introduction and Review , 2011, J. Mach. Learn. Res..
[25] M. Littman,et al. Approaching Bayes-optimalilty using Monte-Carlo tree search , 2011 .
[26] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[27] Leslie Pack Kaelbling,et al. Bayesian Policy Search with Policy Priors , 2011, IJCAI.
[28] Wei Chu,et al. An unbiased offline evaluation of contextual bandit algorithms with generalized linear models , 2011 .
[29] Peter Dayan,et al. Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search , 2012, NIPS.
[30] Pieter Abbeel,et al. Safe Exploration in Markov Decision Processes , 2012, ICML.
[31] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[32] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[33] Peter Dayan,et al. Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search , 2013, J. Artif. Intell. Res..