Recent Advances in Reinforcement Learning

Editorial T.G. Dietterich. Introduction L.P. Kaelbling. Efficient Reinforcement Learning Through Symbiotic Evolution D.E. Moriarty, R. Mikkulainen. Linear Least-Squares Algorithms for Temporal Difference Learning S.J. Bradtke, A.G. Barto. Feature-Based Methods for Large Scale Dynamic Programming J.N. Tsitsiklis, B. Van Roy. On the Worst-Case Analysis of Temporal-Difference Learning Algorithms R.E. Schapire, M.K. Warmuth. Reinforcement Learning with Replacing Eligibility Traces S.P. Singh, R.S. Sutton. Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results S. Mahadevan. The Loss from Imperfect Value Functions in Expectation-Based and Minimax-Based Tasks M. Heger. The Effect of Representation and Knowledge on Goal-Directed Exploration with Reinforcement-Learning Algorithms S. Koenig, R.G. Simmons. Creating Advice-Taking Reinforcement Learners R. Maclin, J.W. Shavlik. Technical Note: Incremental Multi-Step Q-Learning J. Peng, R.J. Williams.

[1]  W. Wasow A note on the inversion of matrices by random walks , 1952 .

[2]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[3]  D. Blackwell Discrete Dynamic Programming , 1962 .

[4]  Eric V. Denardo,et al.  Computing a Bias-Optimal Policy in a Discrete-Time Markov Decision Problem , 1970, Oper. Res..

[5]  A. H. Klopf,et al.  Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .

[6]  A. Hordijk,et al.  A MODIFIED FORM OF THE ITERATIVE METHOD OF DYNAMIC PROGRAMMING , 1975 .

[7]  Philip Klahr,et al.  Advice-Taking and Knowledge Refinement: An Iterative View of Skill Acquisition , 1980 .

[8]  James S. Albus,et al.  Brains, behavior, and robotics , 1981 .

[9]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[11]  Paul J. Schweitzer,et al.  Successive Approximation Methods for Solving Nested Functional Equations in Markov Decision Problems , 1984, Math. Oper. Res..

[12]  John H. Holland,et al.  Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[13]  Pravin Varaiya,et al.  Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[14]  Leslie Pack Kaelbling Rex: A Symbolic Language for the Design and Parallel Implementation of Embedded Systems , 1987 .

[15]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[16]  Jude Shavlik,et al.  An Approach to Combining Explanation-based and Neural Learning Algorithms , 1989 .

[17]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[18]  Joachim Diederich "Learning by Instruction" in connectionist Systems , 1989, ML.

[19]  Richard S. Sutton,et al.  Learning and Sequential Decision Making , 1989 .

[20]  Fu,et al.  Integration of neural heuristics into knowledge-based inference , 1989 .

[21]  A. Jalali,et al.  Computationally efficient adaptive control algorithms for Markov chains , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[22]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[23]  Joseph F. Engelberger,et al.  Robotics in Service , 1989 .

[24]  Jude Shavlik,et al.  Refinement ofApproximate Domain Theories by Knowledge-Based Neural Networks , 1990, AAAI.

[25]  Michael Hucka,et al.  Correcting and Extending Domain Knowledge using Outside Guidance , 1990, ML.

[26]  A. Jalali,et al.  A distributed asynchronous algorithm for expected average cost dynamic programming , 1990, 29th IEEE Conference on Decision and Control.

[27]  M. Puterman,et al.  An improved algorithm for solving communicating average reward Markov decision processes , 1991 .

[28]  Steven D. Whitehead,et al.  A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning , 1991, AAAI.

[29]  Richard S. Sutton,et al.  Reinforcement learning architectures for animats , 1991 .

[30]  Paul E. Utgoff,et al.  Two Kinds of Training Information For Evaluation Function Learning , 1991, AAAI.

[31]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[32]  Richard Yee,et al.  Abstraction in Control Learning , 1992 .

[33]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[34]  Toru Ishida,et al.  Moving Target Search with Intelligence , 1992, AAAI.

[35]  Sridhar Mahadevan,et al.  Enhancing Transfer in Reinforcement Learning by Building Stochastic Models of Robot Actions , 1992, ML.

[36]  Tom M. Mitchell,et al.  A Personal Learning Apprentice , 1992, AAAI.

[37]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[38]  Satinder P. Singh,et al.  Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[39]  Sebastian Thrun,et al.  Efficient Exploration In Reinforcement Learning , 1992 .

[40]  C. Lee Giles,et al.  Training Second-Order Recurrent Neural Networks using Hints , 1992, ML.

[41]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[42]  Devika Subramanian,et al.  A Multistrategy Learning Scheme for Agent Knowledge Acquisition , 1993, Informatica.

[43]  Andreas Weigend,et al.  On overfitting and the effective number of hidden units , 1993 .

[44]  Jonas Karlsson,et al.  Learning Multiple Goal Behavior via Task Decomposition and Dynamic Policy Merging , 1993 .

[45]  Andrew G. Barto,et al.  Monte Carlo Matrix Inversion and Reinforcement Learning , 1993, NIPS.

[46]  John E. Laird,et al.  Learning Procedures from Interactive Natural Language Instructions , 1993, ICML.

[47]  Reid G. Simmons,et al.  Complexity Analysis of Real-Time Reinforcement Learning , 1993, AAAI.

[48]  Richard S. Sutton,et al.  Online Learning with Random Representations , 1993, ICML.

[49]  Philip M. Long,et al.  Worst-case quadratic loss bounds for a generalization of the Widrow-Hoff rule , 1993, COLT '93.

[50]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[51]  Sebastian Thrun,et al.  Integrating Inductive Neural Network Learning and Explanation-Based Learning , 1993, IJCAI.

[52]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[53]  J. Peng,et al.  Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.

[54]  Long Ji Lin,et al.  Scaling Up Reinforcement Learning for Robot Control , 1993, International Conference on Machine Learning.

[55]  Leslie Pack Kaelbling,et al.  Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[56]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[57]  Satinder P. Singh,et al.  Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.

[58]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[59]  M.A.F. Mcdonald,et al.  Approximate Discounted Dynamic Programming Is Unreliable , 1994 .

[60]  Patrick Suppes,et al.  Language and Learning for Robots , 1994 .

[61]  Sridhar Mahadevan,et al.  To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning , 1994, ICML.

[62]  Richard Goodwin Reasoning About When to Start Acting , 1994, AIPS.

[63]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[64]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[65]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[66]  Jude W. Shavlik,et al.  Knowledge-Based Artificial Neural Networks , 1994, Artif. Intell..

[67]  Raymond J. Mooney,et al.  Theory Refinement Combining Analytical and Empirical Methods , 1994, Artif. Intell..

[68]  Nils J. Nilsson,et al.  Teleo-Reactive Programs for Agent Control , 1993, J. Artif. Intell. Res..

[69]  Jude W. Shavlik,et al.  Using Sampling and Queries to Extract Rules from Trained Neural Networks , 1994, ICML.

[70]  Doug Riecken Intelligent agents , 1994, CACM.

[71]  Matthias Heger,et al.  Consideration of Risk in Reinforcement Learning , 1994, ICML.

[72]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[73]  Garrison W. Cottrell,et al.  Towards Instructable Connectionist Systems , 1995 .

[74]  Craig Boutilier,et al.  Process-Oriented Planning and Average-Reward Optimality , 1995, IJCAI.

[75]  Richard S. Sutton,et al.  TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[76]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[77]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[78]  Giovanni Soda,et al.  Unified Integration of Explicit Knowledge and Learning by Example in Recurrent Networks , 1995, IEEE Trans. Knowl. Data Eng..

[79]  Yaser S. Abu-Mostafa,et al.  Hints , 2018, Neural Computation.

[80]  Shlomo Zilberstein,et al.  Operational Rationality through Compilation of Anytime Algorithms , 1995, AI Mag..

[81]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[82]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[83]  Matthias Heger The Loss from Imperfect Value Functions in Expectation-Based and Minimax-Based Tasks , 1996, Machine Learning.