Beyond Reward: The Problem of Knowledge and Data

Intelligence can be defined, informally, as knowing a lot and being able to use that knowledge flexibly to achieve one's goals. In this sense it is clear that knowledge is central to intelligence. However, it is less clear exactly what knowledge is, what gives it meaning, and how it can be efficiently acquired and used. In this talk we re-examine aspects of these age-old questions in light of modern experience (and particularly in light of recent work in reinforcement learning). Such questions are not just of philosophical or theoretical import; they directly effect the practicality of modern knowledge-based systems, which tend to become unwieldy and brittle--difficult to change--as the knowledge base becomes large and diverse.

[1]  R. Sutton,et al.  A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.

[2]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[3]  R. Sutton,et al.  Investigating Experience: Temporal Coherence and Empirical Knowledge Representation , 2007 .

[4]  Jean-Daniel Zucker,et al.  Abstraction, Reformulation and Approximation, 6th International Symposium, SARA 2005, Airth Castle, Scotland, UK, July 26-29, 2005, Proceedings , 2005, SARA.

[5]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[6]  R. Sutton,et al.  Macro-Actions in Reinforcement Learning: An Empirical Analysis , 1998 .

[7]  R. Sutton The Grand Challenge of Predictive Empirical Abstract Knowledge , 2009 .

[8]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[9]  R. Sutton,et al.  Gradient temporal-difference learning algorithms , 2011 .

[10]  Richard S. Sutton,et al.  Temporal Abstraction in Temporal-difference Networks , 2005, NIPS.

[11]  Doina Precup,et al.  Learning Options in Reinforcement Learning , 2002, SARA.

[12]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[13]  R. Sutton,et al.  GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .

[14]  Richard S. Sutton,et al.  A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[17]  Shalabh Bhatnagar,et al.  Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[18]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[19]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[20]  Shalabh Bhatnagar,et al.  Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.

[21]  Richard S. Sutton,et al.  Multi-timescale Nexting in a Reinforcement Learning Robot , 2012, SAB.

[22]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[23]  Shalabh Bhatnagar,et al.  Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.

[24]  Richard S. Sutton,et al.  GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.

[25]  Thomas Degris,et al.  Scaling-up Knowledge for a Cognizant Robot , 2012, AAAI Spring Symposium: Designing Intelligent Robots.

[26]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[27]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[28]  Shie Mannor,et al.  Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.