Off-policy Model-based Learning under Unknown Factored Dynamics
暂无分享,去创建一个
[1] Thomas G. Dietterich. The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.
[2] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[3] Dale Schuurmans,et al. Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs , 2002, ICML.
[4] Benjamin Van Roy,et al. Near-optimal Reinforcement Learning in Factored MDPs , 2014, NIPS.
[5] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[6] Olivier Sigaud,et al. Learning the structure of Factored Markov Decision Processes in reinforcement learning problems , 2006, ICML.
[7] Louis Wehenkel,et al. Model-Free Monte Carlo-like Policy Evaluation , 2010, AISTATS.
[8] Adel M. Alimi,et al. Dynamic MMHC: A Local Search Algorithm for Dynamic Bayesian Network Structure Learning , 2013, IDA.
[9] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[10] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[11] Lihong Li,et al. On Minimax Optimal Offline Policy Evaluation , 2014, ArXiv.
[12] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[13] Kevin P. Murphy,et al. Learning the Structure of Dynamic Probabilistic Networks , 1998, UAI.
[14] E. Ordentlich,et al. Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .
[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[16] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[17] Matthew Richardson,et al. Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.
[18] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[19] Lihong Li,et al. The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning , 2009, ICML '09.
[20] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..
[21] Peter Stone,et al. Structure Learning in Ergodic Factored MDPs without Knowledge of the Transition Function's In-Degree , 2011, ICML.
[22] Michael L. Littman,et al. A unifying framework for computational reinforcement learning theory , 2009 .
[23] Peter Stone,et al. Generalized model learning for reinforcement learning in factored domains , 2009, AAMAS.
[24] Michael L. Littman,et al. Efficient Structure Learning in Factored-State MDPs , 2007, AAAI.
[25] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[26] Peter Stone,et al. Model-based function approximation in reinforcement learning , 2007, AAMAS '07.
[27] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.