Unsupervised Real-Time Control Through Variational Empowerment

We introduce a methodology for efficiently computing a lower bound to empowerment, allowing it to be used as an unsupervised cost function for policy learning in real-time control. Empowerment, being the channel capacity between actions and states, maximises the influence of an agent on its near future. It has been shown to be a good model of biological behaviour in the absence of an extrinsic goal. But empowerment is also prohibitively hard to compute, especially in nonlinear continuous spaces. We introduce an efficient, amortised method for learning empowerment-maximising policies. We demonstrate that our algorithm can reliably handle continuous dynamical systems using system dynamics learned from raw data. The resulting policies consistently drive the agents into states where they can use their full potential.

[1]  Richard E. Blahut,et al.  Computation of channel capacity and rate-distortion functions , 1972, IEEE Trans. Inf. Theory.

[2]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[3]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[4]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[5]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[6]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[7]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[8]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[9]  John N. Tsitsiklis,et al.  Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..

[10]  John N. Tsitsiklis,et al.  Regression methods for pricing complex American-style options , 2001, IEEE Trans. Neural Networks.

[11]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[12]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[13]  Naoki Abe,et al.  Cross channel optimized marketing by reinforcement learning , 2004, KDD.

[14]  D. Ernst,et al.  Approximate Value Iteration in the Reinforcement Learning Context. Application to Electrical Power System Control. , 2005 .

[15]  Chrystopher L. Nehaniv,et al.  Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.

[16]  Chrystopher L. Nehaniv,et al.  All Else Being Equal Be Empowered , 2005, ECAL.

[17]  Richard S. Sutton,et al.  Reinforcement Learning of Local Shape in the Game of Go , 2007, IJCAI.

[18]  Chrystopher L. Nehaniv,et al.  Keep Your Options Open: An Information-Based Driving Principle for Sensorimotor Systems , 2008, PloS one.

[19]  Pierre-Yves Oudeyer,et al.  How can we define intrinsic motivation , 2008 .

[20]  Daniel Polani,et al.  Information: Currency of life? , 2009, HFSP journal.

[21]  Dale Schuurmans,et al.  Learning Exercise Policies for American Options , 2009, AISTATS.

[22]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[23]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[24]  Peter Stone,et al.  Empowerment for continuous agent—environment systems , 2011, Adapt. Behav..

[25]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[26]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[27]  Christoph Salge,et al.  Approximation of Empowerment in the continuous Domain , 2013, Adv. Complex Syst..

[28]  Jürgen Schmidhuber,et al.  PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[29]  Christoph Salge,et al.  Empowerment - an Introduction , 2013, ArXiv.

[30]  A D Wissner-Gross,et al.  Causal entropic forces. , 2013, Physical review letters.

[31]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[32]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[33]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[34]  Shakir Mohamed,et al.  Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[35]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[36]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[37]  U. V. Luxburg,et al.  Improving Variational Autoencoders with Inverse Autoregressive Flow , 2016 .

[38]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[39]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[40]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[41]  Maximilian Karl,et al.  Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data , 2016, ICLR.

[42]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[43]  S. Shankar Sastry,et al.  Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning , 2017, ArXiv.

[44]  Ilya Kostrikov,et al.  Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.