Approximate Value Iteration with Temporally Extended Actions
暂无分享,去创建一个
[1] Edsger W. Dijkstra,et al. A note on two problems in connexion with graphs , 1959, Numerische Mathematik.
[2] H. Scarf. THE OPTIMALITY OF (S,S) POLICIES IN THE DYNAMIC INVENTORY PROBLEM , 1959 .
[3] Nils J. Nilsson,et al. A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..
[4] Jose Augusto Ramos Soares,et al. Graph Spanners: a Survey , 1992 .
[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[6] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.
[7] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[8] Suresh P. Sethi,et al. Optimality of (s, S) Policies in Inventory Models with Markovian Demand , 1995, Oper. Res..
[9] Doina Precup,et al. Multi-time Models for Temporally Abstract Planning , 1997, NIPS.
[10] Doina Precup,et al. Theoretical Results on Reinforcement Learning with Temporally Abstract Options , 1998, ECML.
[11] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[12] Jesse Hoey,et al. SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.
[13] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[14] Doina Precup,et al. Learning Options in Reinforcement Learning , 2002, SARA.
[15] S. Minner. Multiple-supplier inventory models in supply chain management: A review , 2003 .
[16] Shie Mannor,et al. Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.
[17] Glenn A. Iba,et al. A Heuristic Approach to the Discovery of Macro-Operators , 1989, Machine Learning.
[18] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[19] Andrew G. Barto,et al. Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.
[20] Alicia P. Wolfe,et al. Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.
[21] Jean-Claude Latombe,et al. Landmark-Based Robot Navigation , 1992, Algorithmica.
[22] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[23] Peter Stone,et al. Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..
[24] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.
[25] Peter Sanders,et al. Highway Hierarchies Hasten Exact Shortest Path Queries , 2005, ESA.
[26] Manuela M. Veloso,et al. Probabilistic policy reuse in a reinforcement learning agent , 2006, AAMAS '06.
[27] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[28] Andrew G. Barto,et al. Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.
[29] Robert Givan,et al. FF-Replan: A Baseline for Probabilistic Planning , 2007, ICAPS.
[30] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[31] Nicholas Roy,et al. CORL: A Continuous-state Offset-dynamics Reinforcement Learner , 2008, UAI.
[32] Peter Stone,et al. Hierarchical model-based reinforcement learning: R-max + MAXQ , 2008, ICML '08.
[33] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[34] Shie Mannor,et al. Regularized Fitted Q-iteration: Application to Planning , 2008, EWRL.
[35] Andrew G. Barto,et al. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.
[36] Alessandro Lazaric,et al. Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.
[37] Satinder P. Singh,et al. Linear options , 2010, AAMAS.
[38] Scott Kuindersma,et al. Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories , 2010, NIPS.
[39] Doina Precup,et al. Optimal policy switching algorithms for reinforcement learning , 2010, AAMAS.
[40] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[41] Nicholas Roy,et al. Efficient Planning under Uncertainty with Macro-actions , 2014, J. Artif. Intell. Res..
[42] Marco Wiering,et al. Connectionist reinforcement learning for intelligent unit micro management in StarCraft , 2011, The 2011 International Joint Conference on Neural Networks.
[43] Leslie Pack Kaelbling,et al. DetH*: Approximate Hierarchical Solution of Large Markov Decision Processes , 2011, IJCAI.
[44] Matthieu Geist,et al. Approximate Modified Policy Iteration , 2012, ICML.
[45] David Silver,et al. Compositional Planning Using Optimal Option Models , 2012, ICML.
[46] Matthieu Geist,et al. Approximate Modied Policy Iteration , 2012 .
[47] Thomas G. Dietterich,et al. PAC Optimal Planning for Invasive Species Management: Improved Exploration for Reinforcement Learning from Simulator-Defined MDPs , 2013, AAAI.
[48] Shie Mannor,et al. Temporal Difference Methods for the Variance of the Reward To Go , 2013, ICML.
[49] Ned Djilali,et al. GridLAB-D: An Agent-Based Simulation Framework for Smart Grids , 2014, J. Appl. Math..
[50] Shie Mannor,et al. Time-regularized interrupting options , 2014, ICML 2014.
[51] Shie Mannor,et al. Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations , 2014, ICML.