Autonomous Learning of State Representations for Control: An Emerging Field Aims to Autonomously Learn State Representations for Reinforcement Learning Agents from Their Real-World Sensor Observations

This article reviews an emerging field that aims for autonomous reinforcement learning (RL) directly on sensor-observations. Straightforward end-to-end RL has recently shown remarkable success, but relies on large amounts of samples. As this is not feasible in robotics, we review two approaches to learn intermediate state representations from previous experiences: deep auto-encoders and slow-feature analysis. We analyze theoretical properties of the representations and point to potential improvements.

[1]  Luc De Raedt,et al.  Relational Reinforcement Learning , 2001, Machine Learning.

[2]  O. Brock,et al.  Learning Task-Specific State Representations by Maximizing Slowness and Predictability , 2013 .

[3]  S. Mahadevan,et al.  Proto-transfer Learning in Markov Decision Processes Using Spectral Methods , 2006 .

[4]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[5]  Martin A. Riedmiller,et al.  Reinforcement learning in feedback control , 2011, Machine Learning.

[6]  Satinder P. Singh,et al.  On discovery and learning of models with predictive representations of state for agents with continuous actions and observations , 2007, AAMAS '07.

[7]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[8]  Andrew G. Barto,et al.  Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.

[9]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[10]  Martin A. Riedmiller,et al.  Learn to Swing Up and Balance a Real Pole Based on Raw Visual Input Data , 2012, ICONIP.

[11]  Oliver Brock,et al.  State Representation Learning in Robotics: Using Prior Knowledge about Physical Interaction , 2014, Robotics: Science and Systems.

[12]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[13]  Eliseo Ferrante,et al.  Transfer of task representation in reinforcement learning using policy-based proto-value functions , 2008, AAMAS.

[14]  Martin A. Riedmiller,et al.  Autonomous reinforcement learning on raw visual input data in a real world application , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[15]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[16]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[17]  David Silver,et al.  Move Evaluation in Go Using Deep Convolutional Neural Networks , 2014, ICLR.

[18]  Klaus Obermayer,et al.  Generating feature spaces for linear algorithms with regularized sparse kernel slow feature analysis , 2012, Machine Learning.

[19]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[20]  Laurenz Wiskott,et al.  Slow Feature Analysis: A Theoretical Analysis of Optimal Free Responses , 2003, Neural Computation.

[21]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[22]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Marcus Hutter,et al.  Multi-task reinforcement learning : shaping and feature selection , 2011 .

[24]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[25]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[26]  Marc Toussaint,et al.  Planning with Noisy Probabilistic Relational Rules , 2010, J. Artif. Intell. Res..

[27]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[28]  Geoffrey E. Hinton,et al.  Reinforcement Learning with Factored States and Actions , 2004, J. Mach. Learn. Res..

[29]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[30]  Emanuel Todorov,et al.  Combining the benefits of function approximation and trajectory optimization , 2014, Robotics: Science and Systems.

[31]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[32]  George Konidaris,et al.  Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[33]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[34]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[35]  K. Obermayer,et al.  Towards Structural Generalization : Factored Approximate Planning , 2013 .

[36]  Marek Petrik,et al.  An Analysis of Laplacian Methods for Value Function Approximation in MDPs , 2007, IJCAI.

[37]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[38]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[39]  Laurenz Wiskott,et al.  Slowness and Sparseness Lead to Place, Head-Direction, and Spatial-View Cells , 2007, PLoS Comput. Biol..

[40]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[41]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[42]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[43]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[44]  Robert A. Legenstein,et al.  Reinforcement Learning on Slow Features of High-Dimensional Input Streams , 2010, PLoS Comput. Biol..

[45]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[46]  Henning Sprekeler,et al.  On the Relation of Slow Feature Analysis and Laplacian Eigenmaps , 2011, Neural Computation.

[47]  Lihong Li,et al.  An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.

[48]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[49]  Lihong Li,et al.  Analyzing feature generation for value-function approximation , 2007, ICML '07.

[50]  Bo Liu,et al.  Basis Construction from Power Series Expansions of Value Functions , 2010, NIPS.

[51]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[52]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[53]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[54]  Jürgen Schmidhuber,et al.  Low Complexity Proto-Value Function Learning from Sensory Observations with Incremental Slow Feature Analysis , 2012, ICANN.

[55]  Klaus Obermayer,et al.  Construction of approximation spaces for reinforcement learning , 2013, J. Mach. Learn. Res..

[56]  Henry Markram,et al.  Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[57]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.