TD(λ) networks: temporal-difference networks with eligibility traces

Temporal-difference (TD) networks have been introduced as a formalism for expressing and learning grounded world knowledge in a predictive form (Sutton & Tanner, 2005). Like conventional TD(0) methods, the learning algorithm for TD networks uses 1-step backups to train prediction units about future events. In conventional TD learning, the TD(λ) algorithm is often used to do more general multi-step backups of future predictions. In our work, we introduce a generalization of the 1-step TD network specification that is based on the TD(λ) learning algorithm, creating TD(λ) networks. We present experimental results that show TD(λ) networks can learn solutions in more complex environments than TD networks. We also show that in problems that can be solved by TD networks, TD(λ) networks generally learn solutions much faster than their 1-step counterparts. Finally, we present an analysis of our algorithm that shows that the computational cost of TD(λ) networks is only slightly more than that of TD networks.

[1]  Luc De Raedt,et al.  Proceedings of the 22nd international conference on Machine learning , 2005 .

[2]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[3]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[4]  Robert E. Schapire,et al.  A new approach to unsupervised learning in deterministic environments , 1990 .

[5]  K. Aberer,et al.  German National Research Center for Information Technology , 2007 .

[6]  Michael R. James,et al.  Learning predictive state representations in dynamical systems without reset , 2005, ICML.

[7]  Steven J. Bradtke,et al.  Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[8]  Richard S. Sutton,et al.  Temporal-Difference Networks , 2004, NIPS.

[9]  Satinder P. Singh,et al.  A Nonlinear Predictive State Representation , 2003, NIPS.

[10]  Justin A. Boyan,et al.  Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.

[11]  Richard S. Sutton,et al.  TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[12]  Leslie Pack Kaelbling,et al.  Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[13]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[14]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[15]  Richard S. Sutton,et al.  Temporal-Difference Networks with History , 2005, IJCAI.

[16]  H. Jaeger Discrete-time, discrete-valued observable operator models: a tutorial , 2003 .

[17]  Peter Stone,et al.  Learning Predictive State Representations , 2003, ICML.

[18]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[19]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[20]  Robert E. Schapire,et al.  A new approach to unsupervised learning in deterministic environments , 1990 .