Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations
暂无分享,去创建一个
[1] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.
[2] Doina Precup,et al. Theoretical Results on Reinforcement Learning with Temporally Abstract Options , 1998, ECML.
[3] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[4] Nicholas Roy,et al. Efficient Planning under Uncertainty with Macro-actions , 2014, J. Artif. Intell. Res..
[5] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.
[6] David Silver,et al. Compositional Planning Using Optimal Option Models , 2012, ICML.
[7] Shie Mannor,et al. Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.
[8] Minho Lee,et al. Autonomous and Interactive Improvement of Binocular Visual Depth Estimation through Sensorimotor Interaction , 2013, IEEE Transactions on Autonomous Mental Development.
[9] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[10] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[11] Leslie Pack Kaelbling,et al. Approximate Planning in POMDPs with Macro-Actions , 2003, NIPS.
[12] H. Scarf. THE OPTIMALITY OF (S,S) POLICIES IN THE DYNAMIC INVENTORY PROBLEM , 1959 .
[13] Doina Precup,et al. Learning Options in Reinforcement Learning , 2002, SARA.