Online Discovery of Feature Dependencies

Online representational expansion techniques have improved the learning speed of existing reinforcement learning (RL) algorithms in low dimensional domains, yet existing online expansion methods do not scale well to high dimensional problems. We conjecture that one of the main difficulties limiting this scaling is that features defined over the full-dimensional state space often generalize poorly. Hence, we introduce incremental Feature Dependency Discovery (iFDD) as a computationally-inexpensive method for representational expansion that can be combined with any online, value-based RL method that uses binary features. Unlike other online expansion techniques, iFDD creates new features in low dimensional subspaces of the full state space where feedback errors persist. We provide convergence and computational complexity guarantees for iFDD, as well as showing empirically that iFDD scales well to high dimensional multi-agent planning domains with hundreds of millions of state-action pairs.

[1]  J. Albus A Theory of Cerebellar Function , 1971 .

[2]  Richard S. Sutton,et al.  Online Learning with Random Representations , 1993, ICML.

[3]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[4]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[5]  Manuela M. Veloso,et al.  Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.

[6]  Michael Buro,et al.  From Simple Features to Sophisticated Evaluation Functions , 1998, Computers and Games.

[7]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[8]  Stuart I. Reynolds Adaptive Resolution Model-Free Reinforcement Learning: Decision Boundary Partitioning , 2000, ICML.

[9]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[10]  Steven J. Bradtke,et al.  Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[11]  Andrew W. Moore,et al.  Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[12]  Doina Precup,et al.  Sparse Distributed Memories for On-Line Value-Based Reinforcement Learning , 2004, ECML.

[13]  Peter Stone,et al.  Function Approximation via Tile Coding: Automating Parameter Choice , 2005, SARA.

[14]  Sridhar Mahadevan,et al.  Representation Policy Iteration , 2005, UAI.

[15]  Robert Givan,et al.  Feature-Discovering Approximate Value Iteration Methods , 2005, SARA.

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[18]  Shimon Whiteson,et al.  Evolutionary Function Approximation for Reinforcement Learning , 2006, J. Mach. Learn. Res..

[19]  Nathan R. Sturtevant,et al.  Feature Construction for Reinforcement Learning in Hearts , 2006, Computers and Games.

[20]  Shie Mannor,et al.  Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.

[21]  Alborz Geramifard,et al.  Incremental Least-Squares Temporal Difference Learning , 2006, AAAI.

[22]  Scott Sanner,et al.  Online Feature Discovery in Relational Reinforcement Learning , 2006 .

[23]  Peter Stone,et al.  IFSA: incremental feature-set augmentation for reinforcement learning tasks , 2007, AAMAS '07.

[24]  S. Whiteson,et al.  Adaptive Tile Coding for Value Function Approximation , 2007 .

[25]  Lihong Li,et al.  Analyzing feature generation for value-function approximation , 2007, ICML '07.

[26]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[27]  Philippe Preux,et al.  Basis Function Construction in Reinforcement Learning Using Cascade-Correlation Learning Architecture , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[28]  Richard S. Sutton,et al.  Sample-based learning and search with permanent and transient memories , 2008, ICML '08.

[29]  Antony Waldock,et al.  Fuzzy Q-Learning with an adaptive representation , 2008, 2008 IEEE International Conference on Fuzzy Systems (IEEE World Congress on Computational Intelligence).

[30]  Risto Miikkulainen,et al.  Online kernel selection for Bayesian reinforcement learning , 2008, ICML '08.

[31]  Alborz Geramifard,et al.  Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.

[32]  Jonathan P. How,et al.  Approximate dynamic programming using Bellman residual elimination and Gaussian process regression , 2009, 2009 American Control Conference.

[33]  Gavin Taylor,et al.  Kernelized value function approximation for reinforcement learning , 2009, ICML '09.

[34]  Shimon Whiteson,et al.  Automatic Feature Selection for Model-Based Reinforcement Learning in Factored MDPs , 2009, 2009 International Conference on Machine Learning and Applications.

[35]  Andrew Y. Ng,et al.  Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[36]  Lihong Li,et al.  Reinforcement learning for dialog management using least-squares Policy iteration and fast feature selection , 2009, INTERSPEECH.

[37]  Marek Petrik,et al.  Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes , 2010, ICML.

[38]  Stephen Lin,et al.  Evolutionary Tile Coding: An Automated State Abstraction Algorithm for Reinforcement Learning , 2010, Abstraction, Reformulation, and Approximation.

[39]  Richard S. Sutton,et al.  GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.

[40]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .