Explorations in efficient reinforcement learning
暂无分享,去创建一个
[1] Edsger W. Dijkstra,et al. A note on two problems in connexion with graphs , 1959, Numerische Mathematik.
[2] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[3] F. d'Epenoux,et al. A Probabilistic Production and Inventory Problem , 1963 .
[4] R. Bellman,et al. V. Adaptive Control Processes , 1964 .
[5] Gwilym M. Jenkins,et al. Time series analysis, forecasting and control , 1972 .
[6] Gwilym M. Jenkins,et al. Time series analysis, forecasting and control , 1971 .
[7] Nils J. Nilsson,et al. Problem-solving methods in artificial intelligence , 1971, McGraw-Hill computer science series.
[8] J. Albus. A Theory of Cerebellar Function , 1971 .
[9] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .
[10] Alan J. Mayne,et al. Generalized Inverse of Matrices and its Applications , 1972 .
[11] W. J. Studden,et al. Theory Of Optimal Experiments , 1972 .
[12] K. S. Banerjee. Generalized Inverse of Matrices and Its Applications , 1973 .
[13] Ingo Rechenberg,et al. Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .
[14] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .
[15] W. Vent,et al. Rechenberg, Ingo, Evolutionsstrategie — Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. 170 S. mit 36 Abb. Frommann‐Holzboog‐Verlag. Stuttgart 1973. Broschiert , 1975 .
[16] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .
[17] James S. Albus,et al. New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .
[18] Hans J. Berliner,et al. Experiences in Evaluation with BKG - A Program that Plays Backgammon , 1977, IJCAI.
[19] Jon Louis Bentley,et al. An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1976, TOMS.
[20] J J Hopfield,et al. Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.
[21] Jan Telgen,et al. Stochastic Dynamic Programming , 1982 .
[22] Geoffrey E. Hinton,et al. OPTIMAL PERCEPTUAL INFERENCE , 1983 .
[23] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[24] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[25] W. Hamilton,et al. The Evolution of Cooperation , 1984 .
[26] Donald A. Berry,et al. Bandit Problems: Sequential Allocation of Experiments. , 1986 .
[27] Michael Ian Shamos,et al. Computational geometry: an introduction , 1985 .
[28] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[29] Teuvo Kohonen,et al. Self-Organization and Associative Memory , 1988 .
[30] Bernard Widrow,et al. Adaptive switching circuits , 1988 .
[31] N. Wermuth,et al. Graphical Models for Associations between Variables, some of which are Qualitative and some Quantitative , 1989 .
[32] Ingo Rechenberg,et al. Evolution Strategy: Nature’s Way of Optimization , 1989 .
[33] B. Widrow,et al. The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.
[34] C. Watkins. Learning from delayed rewards , 1989 .
[35] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.
[36] Eric B. Baum,et al. A Proposal for More Powerful Learning Algorithms , 1989, Neural Computation.
[37] Jürgen Schmidhuber,et al. Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.
[38] E. Ziegel. Optimal design and analysis of experiments , 1990 .
[39] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[40] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .
[41] Stephen M. Omohundro,et al. Bumptrees for Efficient Function, Constraint and Classification Learning , 1990, NIPS.
[42] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[43] J. Stephen Judd,et al. Neural network design and the complexity of learning , 1990, Neural network modeling and connectionism.
[44] Michael I. Jordan,et al. Hierarchies of Adaptive Experts , 1991, NIPS.
[45] John R. Koza,et al. Genetic evolution and co-evolution of computer programs , 1991 .
[46] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.
[47] Jürgen Schmidhuber,et al. Learning to generate sub-goals for action sequences , 1991 .
[48] Steven J. Nowlan,et al. Soft competitive adaptation: neural network learning algorithms based on fitting statistical mixtures , 1991 .
[49] Stewart W. Wilson,et al. A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .
[50] Sebastian Thrun,et al. Active Exploration in Dynamic Environments , 1991, NIPS.
[51] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.
[52] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .
[53] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.
[54] Satinder P. Singh,et al. The Efficient Learning of Multiple Task Sequences , 1991, NIPS.
[55] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[56] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..
[57] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.
[58] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[59] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .
[60] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[61] Steven Douglas Whitehead,et al. Reinforcement learning for the adaptive control of perception and action , 1992 .
[62] S. Resnick. Adventures in stochastic processes , 1992 .
[63] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .
[64] David A. Cohn,et al. Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.
[65] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[66] Bernd Fritzke. Supervised Learning with Growing Cell Structures , 1993, NIPS.
[67] Michael K. Sahota,et al. Real-time intelligent behaviour in dynamic environments : soccer-playing robots , 1993 .
[68] K. Lindgren,et al. Cooperation and community structure in artificial ecosystems , 1993 .
[69] Terrence J. Sejnowski,et al. Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.
[70] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[71] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[72] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
[73] Ming Li,et al. An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.
[74] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..
[75] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[76] Astro Teller,et al. The evolution of mental models , 1994 .
[77] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[78] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.
[79] M.A.F. Mcdonald,et al. Approximate Discounted Dynamic Programming Is Unreliable , 1994 .
[80] Dana Ron,et al. Learning probabilistic automata with variable memory length , 1994, COLT '94.
[81] Sebastian Thrun,et al. Learning to Play the Game of Chess , 1994, NIPS.
[82] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .
[83] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[84] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[85] Maja J. Mataric,et al. Interaction and intelligent behavior , 1994 .
[86] Shumeet Baluja,et al. A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning , 1994 .
[87] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[88] Dave Cliff,et al. Adding Temporary Memory to ZCS , 1994, Adapt. Behav..
[89] Luis M. de Campos,et al. Probability Intervals: a Tool for uncertain Reasoning , 1994, Int. J. Uncertain. Fuzziness Knowl. Based Syst..
[90] Stewart W. Wilson. ZCS: A Zeroth Level Classifier System , 1994, Evolutionary Computation.
[91] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..
[92] Sebastian Thrun,et al. Finding Structure in Reinforcement Learning , 1994, NIPS.
[93] Matthias Heger,et al. Consideration of Risk in Reinforcement Learning , 1994, ICML.
[94] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.
[95] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[96] S. Hochreiter,et al. REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .
[97] Chen K. Tham,et al. Reinforcement learning of multiple tasks using a hierarchical CMAC architecture , 1995, Robotics Auton. Syst..
[98] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[99] Manuela M. Veloso,et al. Beating a Defender in Robotic Soccer: Memory-Based Learning of a Continuous Function , 1995, NIPS.
[100] A. Roth,et al. Learning in Extensive-Form Games: Experimental Data and Simple Dynamic Models in the Intermediate Term* , 1995 .
[101] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[102] Pawel Cichosz,et al. Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning , 1994, J. Artif. Intell. Res..
[103] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.
[104] Stewart W. Wilson. Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.
[105] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[106] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[107] Yoshua Bengio,et al. Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.
[108] Erkki Oja,et al. Signal Separation by Nonlinear Hebbian Learning , 1995 .
[109] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[110] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[111] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[112] M. A. Wiering. TD Learning of Game Evaluation Functions with Hierarchies Neural Architectures , 1995 .
[113] Tuomas Sandholm,et al. On Multiagent Q-Learning in a Semi-Competitive Domain , 1995, Adaption and Learning in Multi-Agent Systems.
[114] Pattie Maes,et al. Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments , 1996 .
[115] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[116] Pattie Maes,et al. Incremental Self-Improvement for Life-Time Multi-Agent Reinforcement Learning , 1996 .
[117] K. Trovato. A* planning in discrete configuration spaces of autonomous systems , 1996 .
[118] Wenju Liu,et al. Planning in Stochastic Domains: Problem Characteristics and Approximation , 1996 .
[119] Peter Dayan,et al. Exploration bonuses and dual control , 1996 .
[120] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.
[121] Marco Dorigo,et al. Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.
[122] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[123] Mark Humphrys,et al. Action Selection methods using Reinforcement Learning , 1996 .
[124] Jordan B. Pollack,et al. Why did TD-Gammon Work? , 1996, NIPS.
[125] Jürgen Schmidhuber,et al. Solving POMDPs with Levin Search and EIRA , 1996, ICML.
[126] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .
[127] Juergen Schmidhuber,et al. Incremental self-improvement for life-time multi-agent reinforcement learning , 1996 .
[128] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[129] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[130] Jieyu Zhao,et al. Simple Principles of Metalearning , 1996 .
[131] Jeff G. Schneider,et al. Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning , 1996, NIPS.
[132] Juergen Schmidhuber,et al. A General Method For Incremental Self-Improvement And Multi-Agent Learning In Unrestricted Environme , 1999 .
[133] Jürgen Schmidhuber,et al. Evolving Soccer Strategies , 1997, ICONIP.
[134] Fernando J. Pineda,et al. Mean-Field Theory for Batched TD() , 1997, Neural Computation.
[135] Chun-Shin Lin,et al. Learning convergence of CMAC technique , 1997, IEEE Trans. Neural Networks.
[136] James A. Hendler,et al. Co-evolving Soccer Softbot Team Coordination with Genetic Programming , 1997, RoboCup.
[137] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[138] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[139] Hiroaki Kitano,et al. RoboCup: The Robot World Cup Initiative , 1997, AGENTS '97.
[140] Luc Stells,et al. Constructing and Sharing Perceptual Distiinctions , 1997, ECML.
[141] Jürgen Schmidhuber,et al. HQ-Learning , 1997, Adapt. Behav..
[142] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..
[143] J. Schmidhuber. What''s interesting? , 1997 .
[144] Rafal Salustowicz,et al. Probabilistic Incremental Program Evolution , 1997, Evolutionary Computation.
[145] Tomas Landelius,et al. Reinforcement Learning and Distributed Local Model Synthesis , 1997 .
[146] Terrence J. Sejnowski,et al. The “independent components” of natural scenes are edge filters , 1997, Vision Research.
[147] Luca Maria Gambardella,et al. Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..
[148] Jürgen Schmidhuber,et al. On Learning Soccer Strategies , 1997, ICANN.
[149] Doina Precup,et al. Theoretical Results on Reinforcement Learning with Temporally Abstract Options , 1998, ECML.
[150] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[151] Andrew W. Moore,et al. Applying Online Search Techniques to Continuous-State Reinforcement Learning , 1998, AAAI/IAAI.
[152] Marco Dorigo,et al. An adaptive multi-agent routing algorithm inspired by ants behavior , 1998 .
[153] Jürgen Schmidhuber,et al. Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.
[154] C. Lee Giles,et al. How embedded memory in recurrent neural network architectures helps learning long-term temporal dependencies , 1998, Neural Networks.
[155] Sebastian Thrun,et al. Learning Metric-Topological Maps for Indoor Mobile Robot Navigation , 1998, Artif. Intell..
[156] Jürgen Schmidhuber,et al. Efficient model-based exploration , 1998 .
[157] Marco Dorigo,et al. Learning to Control Forest Fires , 1998 .
[158] Jürgen Schmidhuber,et al. CMAC models learn to play soccer , 1998 .
[159] A. Cassandra,et al. Exact and approximate algorithms for partially observable markov decision processes , 1998 .
[160] J. Dam,et al. Environment modelling for mobile robots: neural learning for sensor fusion , 1998 .
[161] Manuela M. Veloso,et al. Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.
[162] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[163] Robert Givan,et al. Bounded-parameter Markov decision processes , 2000, Artif. Intell..