Goal-driven dimensionality reduction for reinforcement learning

Defining a state representation on which optimal control can perform well is a tedious but crucial process. It typically requires expert knowledge, does not generalize straightforwardly over different tasks and strongly influences the quality of the learned controller. In this paper, we present an autonomous feature construction method for learning low-dimensional manifolds of goal-relevant features jointly with an optimal controller using reinforcement learning. Our method combines information-theoretic algorithms with principal component analysis to performs a return-weighted reduction of the state representation. The method does not require any preprocessing of the data, does not assume strong restrictions on the state representation, and substantially improves the performance of learning by reducing the number of samples required. We show that our method can learn high quality controller in redundant spaces, even from pixels, and outperforms both classical and state-of-the-art deep learning approaches.

[1]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[2]  H.H. Yue,et al.  Weighted principal component analysis and its applications to improve FDC performance , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[3]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC , 2005, Eur. J. Control.

[4]  Shie Mannor,et al.  Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..

[5]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[6]  Shie Mannor,et al.  Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.

[7]  S. H. Amirshahi,et al.  Reconstruction of reflectance spectra using weighted principal component analysis , 2008 .

[8]  Shie Mannor,et al.  Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems , 2009, 2009 American Control Conference.

[9]  Andrew Y. Ng,et al.  Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[10]  Sethu Vijayakumar,et al.  Using dimensionality reduction to exploit constraints in reinforcement learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Martin A. Riedmiller,et al.  Deep auto-encoder neural networks in reinforcement learning , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[12]  Masashi Sugiyama,et al.  Feature Selection for Reinforcement Learning: Evaluating Implicit State-Reward Dependency via Conditional Mutual Information , 2010, ECML/PKDD.

[13]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[14]  Alessandro Lazaric,et al.  LSTD with Random Projections , 2010, NIPS.

[15]  Yi Sun,et al.  Incremental Basis Construction from Temporal Difference Error , 2011, ICML.

[16]  Ronald Parr,et al.  Greedy Algorithms for Sparse Reinforcement Learning , 2012, ICML.

[17]  Joelle Pineau,et al.  Bellman Error Based Feature Generation using Random Projections on Sparse Spaces , 2013, NIPS.

[18]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[19]  Xue-feng Yan,et al.  Weighted kernel principal component analysis based on probability density estimation and moving window and its application in nonlinear chemical process monitoring , 2013 .

[20]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[21]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[22]  Jan Peters,et al.  Reinforcement learning vs human programming in tetherball robot games , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[24]  Hany Abdulsamad,et al.  Model-Free Trajectory Optimization for Reinforcement Learning , 2016, ICML.

[25]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.