论文信息 - Autonomous Learning of State Representations for Control: An Emerging Field Aims to Autonomously Learn State Representations for Reinforcement Learning Agents from Their Real-World Sensor Observations

Autonomous Learning of State Representations for Control: An Emerging Field Aims to Autonomously Learn State Representations for Reinforcement Learning Agents from Their Real-World Sensor Observations

This article reviews an emerging field that aims for autonomous reinforcement learning (RL) directly on sensor-observations. Straightforward end-to-end RL has recently shown remarkable success, but relies on large amounts of samples. As this is not feasible in robotics, we review two approaches to learn intermediate state representations from previous experiences: deep auto-encoders and slow-feature analysis. We analyze theoretical properties of the representations and point to potential improvements.

[1] Luc De Raedt,et al. Relational Reinforcement Learning , 2001, Machine Learning.

[2] O. Brock,et al. Learning Task-Specific State Representations by Maximizing Slowness and Predictability , 2013 .

[3] S. Mahadevan,et al. Proto-transfer Learning in Markov Decision Processes Using Spectral Methods , 2006 .

[4] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[5] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.

[6] Satinder P. Singh,et al. On discovery and learning of models with predictive representations of state for agents with continuous actions and observations , 2007, AAMAS '07.

[7] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[8] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.

[9] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[10] Martin A. Riedmiller,et al. Learn to Swing Up and Balance a Real Pole Based on Raw Visual Input Data , 2012, ICONIP.

[11] Oliver Brock,et al. State Representation Learning in Robotics: Using Prior Knowledge about Physical Interaction , 2014, Robotics: Science and Systems.

[12] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[13] Eliseo Ferrante,et al. Transfer of task representation in reinforcement learning using policy-based proto-value functions , 2008, AAMAS.

[14] Martin A. Riedmiller,et al. Autonomous reinforcement learning on raw visual input data in a real world application , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[15] Bernhard Schölkopf,et al. Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[16] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[17] David Silver,et al. Move Evaluation in Go Using Deep Convolutional Neural Networks , 2014, ICLR.

[18] Klaus Obermayer,et al. Generating feature spaces for linear algorithms with regularized sparse kernel slow feature analysis , 2012, Machine Learning.

[19] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[20] Laurenz Wiskott,et al. Slow Feature Analysis: A Theoretical Analysis of Optimal Free Responses , 2003, Neural Computation.

[21] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[22] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23] Marcus Hutter,et al. Multi-task reinforcement learning : shaping and feature selection , 2011 .

[24] Mikhail Belkin,et al. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[25] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[26] Marc Toussaint,et al. Planning with Noisy Probabilistic Relational Rules , 2010, J. Artif. Intell. Res..

[27] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.

[28] Geoffrey E. Hinton,et al. Reinforcement Learning with Factored States and Actions , 2004, J. Mach. Learn. Res..

[29] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[30] Emanuel Todorov,et al. Combining the benefits of function approximation and trajectory optimization , 2014, Robotics: Science and Systems.

[31] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[32] George Konidaris,et al. Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[33] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[34] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..