Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction
暂无分享,去创建一个
Patrick M. Pilarski | Doina Precup | Richard S. Sutton | Adam White | Thomas Degris | Michael Delp | Joseph Modayil | R. Sutton | Doina Precup | Joseph Modayil | M. Delp | T. Degris | P. Pilarski | Adam White
[1] Michael Cunningham. Intelligence: Its Organization and Development , 1972 .
[2] Jack Buchanan. Review of "Computer Models of Thought and Language by Roger C. Schank and Kenneth Mark Colby, eds.", W. H. Freeman & Co., San Francisco, 1973 , 1974, SGAR.
[3] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[4] Gary L. Drescher,et al. Made-up minds - a constructivist approach to artificial intelligence , 1991 .
[5] Paul R. Cohen,et al. Neo: learning conceptual knowledge by sensorimotor interaction with an environment , 1997, AGENTS '97.
[6] Benjamin Kuipers,et al. Map Learning with Uninterpreted Sensors and Effectors , 1995, Artif. Intell..
[7] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[8] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[9] Paul R. Cohen,et al. A Method for Clustering the Experiences of a Mobile Robot that Accords with Human Judgments , 2000, AAAI/IAAI.
[10] Chen Yu,et al. A multimodal learning interface for grounding spoken language in sensory perceptions , 2003, ICMI '03.
[11] Mark B. Ring. CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.
[12] Lorenzo Natale,et al. Linking Action to Perception in a Humanoid Robot: a Developmental Approach to Grasping , 2004 .
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[14] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[15] Richard S. Sutton,et al. Temporal Abstraction in Temporal-difference Networks , 2005, NIPS.
[16] L. P. Kaelbling,et al. Learning Symbolic Models of Stochastic Domains , 2007, J. Artif. Intell. Res..
[17] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[18] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[19] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[20] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[21] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[22] Tim Oates,et al. Learning in Worlds with Objects , 2017, Encyclopedia of Machine Learning and Data Mining.
[23] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .