暂无分享,去创建一个
[1] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[2] Walter L. Smith. Probability and Statistics , 1959, Nature.
[3] J. Hartmanis. Algebraic structure theory of sequential machines (Prentice-Hall international series in applied mathematics) , 1966 .
[4] J. Hartmanis,et al. Algebraic Structure Theory Of Sequential Machines , 1966 .
[5] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[6] Tom M. Mitchell,et al. LEAP: A Learning Apprentice for VLSI Design , 1985, IJCAI.
[7] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[8] E. Visalberghi,et al. “Language” and intelligence in monkeys and apes: Do monkeys ape? , 1990 .
[9] Steven D. Whitehead,et al. Complexity and Cooperation in Q-Learning , 1991, ML.
[10] Steven D. Whitehead,et al. A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning , 1991, AAAI.
[11] Roger B. Myerson,et al. Game theory - Analysis of Conflict , 1991 .
[12] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .
[13] Paul E. Utgoff,et al. Two Kinds of Training Information For Evaluation Function Learning , 1991, AAAI.
[14] Long Ji Lin,et al. Self-improvement Based on Reinforcement Learning, Planning and Teaching , 1991, ML.
[15] David Lee,et al. Online minimization of transition systems (extended abstract) , 1992, STOC '92.
[16] G. Fiorito,et al. Observational Learning in Octopus vulgaris , 1992, Science.
[17] C. Atkeson,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[18] J. Mi,et al. A comparison of the Bonferroni and Scheffé bounds , 1993 .
[19] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[20] Donald Michie,et al. Knowledge, Learning and Machine Intelligence , 1993 .
[21] Henry Lieberman,et al. Mondrian: a teachable graphical editor , 1993, INTERCHI.
[22] A. Russon,et al. Imitation in free-ranging rehabilitant orangutans (Pongo pygmaeus). , 1993, Journal of comparative psychology.
[23] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.
[24] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[25] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[26] Masayuki Inaba,et al. Learning by watching: extracting reusable task knowledge from visual observation of human performance , 1994, IEEE Trans. Robotics Autom..
[27] Ivan Bratko,et al. Reconstructing Human Skill with Machine Learning , 1994, ECAI.
[28] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[29] Peter Bakker,et al. Robot see, robot do: An overview of robot imitation , 1996 .
[30] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[31] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[32] Ivan Bratko,et al. Skill Reconstruction as Induction of LQ Controllers with Subgoals , 1997, IJCAI.
[33] Aude Billard,et al. Learning to Communicate Through Imitation in Autonomous Robots , 1997, ICANN.
[34] Russell Greiner,et al. Why Experimentation can be better than "Perfect Guidance" , 1997, ICML.
[35] Stefan Schaal,et al. Robot Learning From Demonstration , 1997, ICML.
[36] Craig Boutilier,et al. Abstraction and Approximate Decision-Theoretic Planning , 1997, Artif. Intell..
[37] Robert Givan,et al. Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.
[38] Yiannis Demiris,et al. Do Robots Ape , 1997 .
[39] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[40] Maja J. Matarić,et al. Behavior-based primitives for articulated control , 1998 .
[41] Maja J. Mataric,et al. Using communication to reduce locality in distributed multiagent learning , 1997, J. Exp. Theor. Artif. Intell..
[42] S. Pattinson,et al. Learning to fly. , 1998 .
[43] Kerstin Dautenhahn,et al. Mapping between dissim ilar bodies: Affordances and the algebraic foundations of imitation , 1998 .
[44] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.
[45] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[46] R. Byrne,et al. Priming primates: Human and otherwise , 1998, Behavioral and Brain Sciences.
[47] John E. Laird,et al. Learning Hierarchical Performance Knowledge by Observation , 1999, ICML.
[48] Craig Boutilier,et al. Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.
[49] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[50] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..
[51] Ramon López de Mántaras,et al. Imitating human performances to automatically generate expressive jazz ballads , 1999 .
[52] Aude Billard,et al. DRAMA, a Connectionist Architecture for Control and Learning in Autonomous Robots , 1999, Adapt. Behav..
[53] Aude Billard,et al. Imitation skills as a means to enhance learning of a synthetic proto-language in an autonomous robot , 1999 .
[54] P. Todd,et al. Is it really imitation? A review of simple mechanisms in social information gathering , 1999 .
[55] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[56] Jeffrey M. Forbes,et al. Practical reinforcement learning in continuous domains , 2000 .
[57] Kerstin Dautenhahn,et al. Learning how to do things with imitation , 2000 .
[58] Manuela M. Veloso,et al. Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.
[59] Mario Paolucci,et al. Intelligent Social Learning , 2001, J. Artif. Soc. Soc. Simul..
[60] Craig Boutilier,et al. A Bayesian Approach to Imitation in Reinforcement Learning , 2003, IJCAI.
[61] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[62] S. Bocionek,et al. Robot programming by Demonstration (RPD): Supporting the induction by human interaction , 1996, Machine Learning.
[63] Andrew W. Moore,et al. Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.
[64] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[65] Paul Bourgine,et al. Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty , 1999, Machine Learning.
[66] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.
[67] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[68] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[69] Rüdiger Dillmann,et al. Robot Programming by Demonstration (RPD): Supporting the Induction by Human Interaction , 1996, Machine Learning.
[70] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[71] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.