Better Generalization with Forecasts

Predictive methods are becoming increasingly popular for representing world knowledge in autonomous agents. A recently introduced predictive method that shows particular promise is the General Value Function (GVF), which is more flexible than previous predictive methods and can more readily capture regularities in the agent's sensorimotor stream. The goal of the current paper is to investigate the ability of these GVFs (also called "forecasts") to capture such regularities. We generate focused sets of forecasts and measure their capacity for generalization. We then compare the results with a closely related predictive method (PSRs) already shown to have good generalization abilities. Our results indicate that forecasts provide a substantial improvement in generalization, producing features that lead to better value-function approximation (when computed with linear function approximators) than PSRs and better generalization to as-yet-unseen parts of the state space.

[1]  Richard S. Sutton,et al.  Using Predictive Representations to Improve Generalization in Reinforcement Learning , 2005, IJCAI.

[2]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[3]  Richard S. Sutton,et al.  Temporal Abstraction in Temporal-difference Networks , 2005, NIPS.

[4]  Patrick M. Pilarski,et al.  Acquiring a broad range of empirical knowledge in real time by temporal-difference learning , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[5]  Justin A. Boyan,et al.  Least-Squares Temporal Difference Learning , 1999, ICML.

[6]  R. Sutton,et al.  Acquiring Diverse Predictive Knowledge in Real Time by Temporal-difference Learning , 2012 .

[7]  Mark B. Ring Learning Sequential Tasks by Incrementally Adding Higher Orders , 1992, NIPS.

[8]  Herbert Jaeger,et al.  Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.

[9]  Richard S. Sutton,et al.  Temporal-Difference Networks , 2004, NIPS.

[10]  R. Sutton,et al.  GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .

[11]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[12]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[13]  Thomas G. Dietterich,et al.  In Advances in Neural Information Processing Systems 12 , 1991, NIPS 1991.

[14]  Koby Crammer,et al.  Advances in Neural Information Processing Systems 14 , 2002 .

[15]  Thomas Degris,et al.  Scaling-up Knowledge for a Cognizant Robot , 2012, AAAI Spring Symposium: Designing Intelligent Robots.

[16]  Steven J. Bradtke,et al.  Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[17]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[18]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[19]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[20]  Richard S. Sutton,et al.  GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.

[21]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[22]  Ronald L. Rivest,et al.  Diversity-based inference of finite automata , 1994, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[23]  International Foundation for Autonomous Agents and MultiAgent Systems ( IFAAMAS ) , 2007 .

[24]  Frans M. J. Willems,et al.  The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.