Sketch-Based Linear Value Function Approximation

Hashing is a common method to reduce large, potentially infinite feature vectors to a fixed-size table. In reinforcement learning, hashing is often used in conjunction with tile coding to represent states in continuous spaces. Hashing is also a promising approach to value function approximation in large discrete domains such as Go and Hearts, where feature vectors can be constructed by exhaustively combining a set of atomic features. Unfortunately, the typical use of hashing in value function approximation results in biased value estimates due to the possibility of collisions. Recent work in data stream summaries has led to the development of the tug-of-war sketch, an unbiased estimator for approximating inner products. Our work investigates the application of this new data structure to linear value function approximation. Although in the reinforcement learning setting the use of the tug-of-war sketch leads to biased value estimates, we show that this bias can be orders of magnitude less than that of standard hashing. We provide empirical results on two RL benchmark domains and fifty-five Atari 2600 games to highlight the superior learning performance obtained when using tug-of-war hashing.

[1]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[2]  Risto Miikkulainen,et al.  HyperNEAT-GGP: a hyperNEAT-based atari general game player , 2012, GECCO '12.

[3]  Marc G. Bellemare,et al.  Investigating Contingency Awareness Using Atari 2600 Games , 2012, AAAI.

[4]  Samuel Wintermute,et al.  Using Imagery to Simplify Perceptual Abstraction in Reinforcement Learning Agents , 2010, AAAI.

[5]  Anirban Dasgupta,et al.  A sparse Johnson: Lindenstrauss transform , 2010, STOC '10.

[6]  Alessandro Lazaric,et al.  LSTD with Random Projections , 2010, NIPS.

[7]  Nathan R. Sturtevant,et al.  Feature Construction for Reinforcement Learning in Hearts , 2006, Computers and Games.

[8]  Graham Cormode,et al.  Sketching Streams Through the Net: Distributed Approximate Query Tracking , 2005, VLDB.

[9]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[10]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[11]  Richard S. Sutton,et al.  Reinforcement Learning of Local Shape in the Game of Go , 2007, IJCAI.

[12]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[13]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[14]  Daniel M. Kane,et al.  A Derandomized Sparse Johnson-Lindenstrauss Transform , 2010, Electron. Colloquium Comput. Complex..

[15]  Kenneth Ward Church,et al.  Very sparse random projections , 2006, KDD '06.

[16]  Manuela M. Veloso,et al.  Simultaneous Adversarial Multi-Robot Learning , 2003, IJCAI.

[17]  Larry Carter,et al.  Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..

[18]  Manuela Veloso,et al.  Scalable Learning in Stochastic Games , 2002 .

[19]  Joelle Pineau,et al.  Compressed Least-Squares Regression on Sparse Spaces , 2012, AAAI.

[20]  Rémi Munos,et al.  Compressed Least-Squares Regression , 2009, NIPS.

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[23]  Yavar Naddaf,et al.  Game-independent AI agents for playing Atari 2600 console games , 2010 .

[24]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[25]  Martijn Wisse,et al.  Using a controller based on reinforcement learning for a passive dynamic walking robot , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..