Reinforcement Learning with Subspaces using Free Energy Paradigm

In large-scale problems, standard reinforcement learning algorithms suffer from slow learning speed. In this paper, we follow the framework of using subspaces to tackle this problem. We propose a free-energy minimization framework for selecting the subspaces and integrate the policy of the state-space into the subspaces. Our proposed free-energy minimization framework rests upon Thompson sampling policy and behavioral policy of subspaces and the state-space. It is therefore applicable to a variety of tasks, discrete or continuous state space, model-free and model-based tasks. Through a set of experiments, we show that this general framework highly improves the learning speed. We also provide a convergence proof.

[1]  Daniel A. Braun,et al.  Generalized Thompson sampling for sequential decision-making and causal inference , 2013, Complex Adapt. Syst. Model..

[2]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[3]  Majid Nili Ahmadabadi,et al.  Online learning of task-driven object-based visual attention control , 2010, Image Vis. Comput..

[4]  Daniel A. Braun,et al.  Thermodynamics as a theory of decision-making with information-processing costs , 2012, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[5]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[6]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[7]  Shie Mannor,et al.  Thompson Sampling for Complex Online Problems , 2013, ICML.

[8]  Majid Nili Ahmadabadi,et al.  Interactive Learning in Continuous Multimodal Space: A Bayesian Approach to Action-Based Soft Partitioning and Learning , 2012, IEEE Transactions on Autonomous Mental Development.

[9]  H. Callen Thermodynamics and an Introduction to Thermostatistics , 1988 .

[10]  Wei Wang,et al.  Recommender system application developments: A survey , 2015, Decis. Support Syst..

[11]  Felipe Leno da Silva,et al.  Object-Oriented Curriculum Generation for Reinforcement Learning , 2018, AAMAS.

[12]  Majid Nili Ahmadabadi,et al.  Learning sequential visual attention control through dynamic state space discretization , 2009, 2009 IEEE International Conference on Robotics and Automation.

[13]  Peter Stone,et al.  Learning Curriculum Policies for Reinforcement Learning , 2018, AAMAS.

[14]  Zoran Popovic,et al.  Efficient Bayesian Clustering for Reinforcement Learning , 2016, IJCAI.

[15]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[16]  Alex Graves,et al.  Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.

[17]  Logan T. Trujillo Mental Effort and Information-Processing Costs Are Inversely Related to Global Brain Free Energy During Visual Categorization , 2019, Front. Neurosci..

[18]  M. N. Ahmadabadi,et al.  Reward Maximization Justifies the Transition from Sensory Selection at Childhood to Sensory Integration at Adulthood , 2014, PloS one.

[19]  Stefan Wermter,et al.  Real-world reinforcement learning for autonomous humanoid robot docking , 2012, Robotics Auton. Syst..

[20]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[21]  Ian D. Watson,et al.  Applying reinforcement learning to small scale combat in the real-time strategy game StarCraft:Broodwar , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[22]  Jürgen Schmidhuber,et al.  PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[23]  Thierson Couto,et al.  An evolutionary approach for combining results of recommender systems techniques based on Collaborative Filtering , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[24]  Pieter Abbeel,et al.  Learning vehicular dynamics, with application to modeling helicopters , 2005, NIPS.

[25]  Andreas Holzinger,et al.  Interactive machine learning for health informatics: when do we need the human-in-the-loop? , 2016, Brain Informatics.

[26]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[27]  Rachel W Jackson,et al.  Human-in-the-loop optimization of exoskeleton assistance during walking , 2017, Science.

[28]  Martha White,et al.  Interval Estimation for Reinforcement-Learning Algorithms in Continuous-State Domains , 2010, NIPS.

[29]  Karl J. Friston,et al.  Bayesian model selection for group studies , 2009, NeuroImage.

[30]  Vytautas Perlibakas,et al.  Distance measures for PCA-based face recognition , 2004, Pattern Recognit. Lett..

[31]  Michael L. Littman,et al.  State Abstractions for Lifelong Reinforcement Learning , 2018, ICML.

[32]  Jivko Sinapov,et al.  Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey , 2020, J. Mach. Learn. Res..

[33]  R. Bellman A Markovian Decision Process , 1957 .

[34]  Michael L. Littman,et al.  Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.

[35]  Daniel A. Braun,et al.  Bounded Rational Decision-Making from Elementary Computations That Reduce Uncertainty , 2019, Entropy.

[36]  Majid Nili Ahmadabadi,et al.  Exploiting Generalization in the Subspaces for Faster Model-Based Reinforcement Learning , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[38]  Daniel A. Braun,et al.  Information, Utility and Bounded Rationality , 2011, AGI.

[39]  I-Ming Chen,et al.  Autonomous navigation of UAV by using real-time model-based reinforcement learning , 2016, 2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV).

[40]  Jason Weston,et al.  Dialogue Learning With Human-In-The-Loop , 2016, ICLR.

[41]  E. Ordentlich,et al.  Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .

[42]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[43]  Daniel A. Braun,et al.  Hierarchical Expert Networks for Meta-Learning , 2019, ArXiv.

[44]  Daniel Polani,et al.  Information Theory of Decisions and Actions , 2011 .

[45]  Jordi Grau-Moya,et al.  Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes , 2016, ECML/PKDD.

[46]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[47]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .