Recent Advances in Hierarchical Reinforcement Learning
暂无分享,去创建一个
[1] J. Stevens,et al. Animal Intelligence , 1883, Nature.
[2] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[3] A. L. Samuel,et al. Some studies in machine learning using the game of checkers. II: recent progress , 1967 .
[4] William A. Woods,et al. Computational Linguistics Transition Network Grammars for Natural Language Analysis , 2022 .
[5] A. H. Klopf,et al. Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .
[6] Richard Fikes,et al. Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..
[7] P. Varaiya,et al. Multilayer control of large Markov chains , 1978 .
[8] A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.
[9] John S. Edwards,et al. The Hedonistic Neuron: A Theory of Memory, Learning and Intelligence , 1983 .
[10] R. Korf. Learning to solve problems by searching for macro-operators , 1983 .
[11] Hassan K. Khalil,et al. Singular perturbation methods in control : analysis and design , 1986 .
[12] Rodney A. Brooks,et al. Achieving Artificial Intelligence through Building Robots , 1986 .
[13] David Harel,et al. Statecharts: A Visual Formalism for Complex Systems , 1987, Sci. Comput. Program..
[14] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[15] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.
[16] D. Naidu. Singular Perturbation Methodology in Control Systems , 1988 .
[17] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .
[18] Satinder P. Singh,et al. Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models , 1992, ML.
[19] Satinder P. Singh,et al. Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.
[20] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[21] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[22] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[23] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[24] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[25] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.
[26] Sebastian Thrun,et al. Finding Structure in Reinforcement Learning , 1994, NIPS.
[27] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[28] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[29] Illah R. Nourbakhsh,et al. DERVISH - An Office-Navigating Robot , 1995, AI Mag..
[30] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[31] Pattie Maes,et al. Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments , 1996 .
[32] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[33] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[34] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[35] Andrew G. Barto,et al. Large-scale dynamic optimization using teams of reinforcement learning agents , 1996 .
[36] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[37] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[38] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[39] Leslie Pack Kaelbling,et al. Learning Topological Maps with Weak Local Odometric Information , 1997, IJCAI.
[40] R. Simmons,et al. Xavier: A Robot Navigation Architecture Based on Partially Observable Markov Decision Process Models , 1998 .
[41] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[42] Doina Precup,et al. Multi-time Models for Temporally Abstract Planning , 1997, NIPS.
[43] Roderic A. Grupen,et al. A feedback control structure for on-line learning tasks , 1997, Robotics Auton. Syst..
[44] Doina Precup,et al. Theoretical Results on Reinforcement Learning with Temporally Abstract Options , 1998, ECML.
[45] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[46] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
[47] V. Borkar,et al. A unified framework for hybrid control: model and optimal control theory , 1998, IEEE Trans. Autom. Control..
[48] Bruce L. Digney,et al. Learning hierarchical control structures for multiple tasks and changing environments , 1998 .
[49] Xavier Boyen,et al. Tractable Inference for Complex Stochastic Processes , 1998, UAI.
[50] Jean-Arcady Meyer,et al. Learning Hierarchical Control Structures for Multiple Tasks and Changing Environments , 1998 .
[51] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[52] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[53] S. Mahadevan,et al. Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning , 1999 .
[54] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.
[55] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[56] Gregory Z. Grudic,et al. Localizing Search in Reinforcement Learning , 2000, AAAI/IAAI.
[57] Sridhar Mahadevan,et al. Hierarchical Memory-Based Reinforcement Learning , 2000, NIPS.
[58] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[59] Andrew G. Barto,et al. Automated State Abstraction for Options using the U-Tree Algorithm , 2000, NIPS.
[60] David Andre,et al. Programmable Reinforcement Learning Agents , 2000, NIPS.
[61] Guillermo Ricardo Simari,et al. Multiagent systems: a modern approach to distributed artificial intelligence , 2000 .
[62] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[63] Sridhar Mahadevan,et al. Decision-Theoretic Planning with Concurrent Temporally Extended Actions , 2001, UAI.
[64] Sridhar Mahadevan,et al. Learning Hierarchical Partially Observable Markov Decision Process Models for Robot Navigation , 2001 .
[65] Peter Stone,et al. Keepaway Soccer: A Machine Learning Testbed , 2001, RoboCup.
[66] Sridhar Mahadevan,et al. Continuous-Time Hierarchical Reinforcement Learning , 2001, ICML.
[67] Andrew G. Barto,et al. Lyapunov-Constrained Action Sets for Reinforcement Learning , 2001, ICML.
[68] Peter Stone,et al. Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.
[69] Andrew G. Barto,et al. Autonomous discovery of temporal abstractions from interaction with an environment , 2002 .
[70] Bernhard Hengst,et al. Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.
[71] Andrew G. Barto,et al. Lyapunov Design for Safe Reinforcement Learning , 2003, J. Mach. Learn. Res..
[72] Sridhar Mahadevan,et al. Hierarchically Optimal Average Reward Reinforcement Learning , 2002, ICML.
[73] Saso Dzeroski,et al. Integrating Experimentation and Guidance in Relational Reinforcement Learning , 2002, ICML.
[74] Zhiyuan Ren,et al. A time aggregation approach to Markov decision processes , 2002, Autom..
[75] Sridhar Mahadevan,et al. Learning to Take Concurrent Actions , 2002, NIPS.
[76] Peter Dayan,et al. Dopamine: generalization and bonuses , 2002, Neural Networks.
[77] Sridhar Mahadevan,et al. Hierarchical learning and planning in partially observable markov decision processes , 2002 .
[78] Sridhar Mahadevan,et al. Approximate planning with hierarchical partially observable Markov decision process models for robot navigation , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).
[79] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[80] Victor R. Lesser,et al. Learning to Improve Coordinated Actions in Cooperative Distributed Problem-Solving Environments , 1998, Machine Learning.
[81] Glenn A. Iba,et al. A Heuristic Approach to the Discovery of Macro-Operators , 1989, Machine Learning.
[82] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.
[83] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[84] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.
[85] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.
[86] Yoram Singer,et al. The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.
[87] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[88] Oliver Lemon,et al. Spoken Dialogue Management Using Hierarchical Reinforcement Learning and Dialogue Simulation , 2005 .
[89] Jun Morimoto,et al. Robust Reinforcement Learning , 2005, Neural Computation.
[90] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.