Active Roll-outs in MDP with Irreversible Dynamics
暂无分享,去创建一个
[1] Phuong Nguyen,et al. Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning , 2013, ICML.
[2] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[3] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[4] Gerald Sommer,et al. Learning by biasing , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).
[5] Sarah Filippi,et al. Optimism in reinforcement learning and Kullback-Leibler divergence , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[6] Steffen Udluft,et al. Safe exploration for reinforcement learning , 2008, ESANN.
[7] Guillaume Infantes,et al. Extending Classical Planning Heuristics to Probabilistic Planning with Dead-Ends , 2011, AAAI.
[8] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[9] Amir Massoud Farahmand,et al. Action-Gap Phenomenon in Reinforcement Learning , 2011, NIPS.
[10] Pieter Abbeel,et al. Safe Exploration in Markov Decision Processes , 2012, ICML.
[11] Javier García,et al. Safe Exploration of State and Action Spaces in Reinforcement Learning , 2012, J. Artif. Intell. Res..
[12] Laurent Orseau,et al. Universal Knowledge-Seeking Agents for Stochastic Environments , 2013, ALT.
[13] Mausam,et al. A Theory of Goal-Oriented MDPs with Dead Ends , 2012, UAI.
[14] Paul E. Utgoff,et al. On integrating apprentice learning and reinforcement learning , 1996 .
[15] Peter Stone,et al. Hierarchical model-based reinforcement learning: R-max + MAXQ , 2008, ICML '08.
[16] Alborz Geramifard,et al. UAV cooperative control with stochastic risk models , 2011, Proceedings of the 2011 American Control Conference.
[17] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[18] E. Ordentlich,et al. Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .
[19] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[20] Manuela M. Veloso,et al. Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..