Recent Advances in Reinforcement Learning
暂无分享,去创建一个
[1] W. Wasow. A note on the inversion of matrices by random walks , 1952 .
[2] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .
[3] D. Blackwell. Discrete Dynamic Programming , 1962 .
[4] Eric V. Denardo,et al. Computing a Bias-Optimal Policy in a Discrete-Time Markov Decision Problem , 1970, Oper. Res..
[5] A. H. Klopf,et al. Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .
[6] A. Hordijk,et al. A MODIFIED FORM OF THE ITERATIVE METHOD OF DYNAMIC PROGRAMMING , 1975 .
[7] Philip Klahr,et al. Advice-Taking and Knowledge Refinement: An Iterative View of Skill Acquisition , 1980 .
[8] James S. Albus,et al. Brains, behavior, and robotics , 1981 .
[9] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[10] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[11] Paul J. Schweitzer,et al. Successive Approximation Methods for Solving Nested Functional Equations in Markov Decision Problems , 1984, Math. Oper. Res..
[12] John H. Holland,et al. Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .
[13] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .
[14] Leslie Pack Kaelbling. Rex: A Symbolic Language for the Design and Parallel Implementation of Embedded Systems , 1987 .
[15] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[16] Jude Shavlik,et al. An Approach to Combining Explanation-based and Neural Learning Algorithms , 1989 .
[17] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[18] Joachim Diederich. "Learning by Instruction" in connectionist Systems , 1989, ML.
[19] Richard S. Sutton,et al. Learning and Sequential Decision Making , 1989 .
[20] Fu,et al. Integration of neural heuristics into knowledge-based inference , 1989 .
[21] A. Jalali,et al. Computationally efficient adaptive control algorithms for Markov chains , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.
[22] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.
[23] Joseph F. Engelberger,et al. Robotics in Service , 1989 .
[24] Jude Shavlik,et al. Refinement ofApproximate Domain Theories by Knowledge-Based Neural Networks , 1990, AAAI.
[25] Michael Hucka,et al. Correcting and Extending Domain Knowledge using Outside Guidance , 1990, ML.
[26] A. Jalali,et al. A distributed asynchronous algorithm for expected average cost dynamic programming , 1990, 29th IEEE Conference on Decision and Control.
[27] M. Puterman,et al. An improved algorithm for solving communicating average reward Markov decision processes , 1991 .
[28] Steven D. Whitehead,et al. A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning , 1991, AAAI.
[29] Richard S. Sutton,et al. Reinforcement learning architectures for animats , 1991 .
[30] Paul E. Utgoff,et al. Two Kinds of Training Information For Evaluation Function Learning , 1991, AAAI.
[31] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[32] Richard Yee,et al. Abstraction in Control Learning , 1992 .
[33] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[34] Toru Ishida,et al. Moving Target Search with Intelligence , 1992, AAAI.
[35] Sridhar Mahadevan,et al. Enhancing Transfer in Reinforcement Learning by Building Stochastic Models of Robot Actions , 1992, ML.
[36] Tom M. Mitchell,et al. A Personal Learning Apprentice , 1992, AAAI.
[37] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[38] Satinder P. Singh,et al. Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.
[39] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .
[40] C. Lee Giles,et al. Training Second-Order Recurrent Neural Networks using Hints , 1992, ML.
[41] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[42] Devika Subramanian,et al. A Multistrategy Learning Scheme for Agent Knowledge Acquisition , 1993, Informatica.
[43] Andreas Weigend,et al. On overfitting and the effective number of hidden units , 1993 .
[44] Jonas Karlsson,et al. Learning Multiple Goal Behavior via Task Decomposition and Dynamic Policy Merging , 1993 .
[45] Andrew G. Barto,et al. Monte Carlo Matrix Inversion and Reinforcement Learning , 1993, NIPS.
[46] John E. Laird,et al. Learning Procedures from Interactive Natural Language Instructions , 1993, ICML.
[47] Reid G. Simmons,et al. Complexity Analysis of Real-Time Reinforcement Learning , 1993, AAAI.
[48] Richard S. Sutton,et al. Online Learning with Random Representations , 1993, ICML.
[49] Philip M. Long,et al. Worst-case quadratic loss bounds for a generalization of the Widrow-Hoff rule , 1993, COLT '93.
[50] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[51] Sebastian Thrun,et al. Integrating Inductive Neural Network Learning and Explanation-Based Learning , 1993, IJCAI.
[52] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.
[53] J. Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.
[54] Long Ji Lin,et al. Scaling Up Reinforcement Learning for Robot Control , 1993, International Conference on Machine Learning.
[55] Leslie Pack Kaelbling,et al. Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.
[56] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[57] Satinder P. Singh,et al. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.
[58] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[59] M.A.F. Mcdonald,et al. Approximate Discounted Dynamic Programming Is Unreliable , 1994 .
[60] Patrick Suppes,et al. Language and Learning for Robots , 1994 .
[61] Sridhar Mahadevan,et al. To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning , 1994, ICML.
[62] Richard Goodwin. Reasoning About When to Start Acting , 1994, AIPS.
[63] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[64] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[65] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[66] Jude W. Shavlik,et al. Knowledge-Based Artificial Neural Networks , 1994, Artif. Intell..
[67] Raymond J. Mooney,et al. Theory Refinement Combining Analytical and Empirical Methods , 1994, Artif. Intell..
[68] Nils J. Nilsson,et al. Teleo-Reactive Programs for Agent Control , 1993, J. Artif. Intell. Res..
[69] Jude W. Shavlik,et al. Using Sampling and Queries to Extract Rules from Trained Neural Networks , 1994, ICML.
[70] Doug Riecken. Intelligent agents , 1994, CACM.
[71] Matthias Heger,et al. Consideration of Risk in Reinforcement Learning , 1994, ICML.
[72] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[73] Garrison W. Cottrell,et al. Towards Instructable Connectionist Systems , 1995 .
[74] Craig Boutilier,et al. Process-Oriented Planning and Average-Reward Optimality , 1995, IJCAI.
[75] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[76] Manfred K. Warmuth,et al. Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.
[77] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[78] Giovanni Soda,et al. Unified Integration of Explicit Knowledge and Learning by Example in Recurrent Networks , 1995, IEEE Trans. Knowl. Data Eng..
[79] Yaser S. Abu-Mostafa,et al. Hints , 2018, Neural Computation.
[80] Shlomo Zilberstein,et al. Operational Rationality through Compilation of Anytime Algorithms , 1995, AI Mag..
[81] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[82] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[83] Matthias Heger. The Loss from Imperfect Value Functions in Expectation-Based and Minimax-Based Tasks , 1996, Machine Learning.