论文信息 - Reinforcement Learning with Subspaces using Free Energy Paradigm

Reinforcement Learning with Subspaces using Free Energy Paradigm

In large-scale problems, standard reinforcement learning algorithms suffer from slow learning speed. In this paper, we follow the framework of using subspaces to tackle this problem. We propose a free-energy minimization framework for selecting the subspaces and integrate the policy of the state-space into the subspaces. Our proposed free-energy minimization framework rests upon Thompson sampling policy and behavioral policy of subspaces and the state-space. It is therefore applicable to a variety of tasks, discrete or continuous state space, model-free and model-based tasks. Through a set of experiments, we show that this general framework highly improves the learning speed. We also provide a convergence proof.

Majid Nili Ahmadabadi | Seyed Pooya Shariatpanahi | Reshad Hosseini | Milad Ghorbani

[1] Daniel A. Braun,et al. Generalized Thompson sampling for sequential decision-making and causal inference , 2013, Complex Adapt. Syst. Model..

[2] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[3] Majid Nili Ahmadabadi,et al. Online learning of task-driven object-based visual attention control , 2010, Image Vis. Comput..

[4] Daniel A. Braun,et al. Thermodynamics as a theory of decision-making with information-processing costs , 2012, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[5] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[6] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[7] Shie Mannor,et al. Thompson Sampling for Complex Online Problems , 2013, ICML.

[8] Majid Nili Ahmadabadi,et al. Interactive Learning in Continuous Multimodal Space: A Bayesian Approach to Action-Based Soft Partitioning and Learning , 2012, IEEE Transactions on Autonomous Mental Development.

[9] H. Callen. Thermodynamics and an Introduction to Thermostatistics , 1988 .

[10] Wei Wang,et al. Recommender system application developments: A survey , 2015, Decis. Support Syst..

[11] Felipe Leno da Silva,et al. Object-Oriented Curriculum Generation for Reinforcement Learning , 2018, AAMAS.

[12] Majid Nili Ahmadabadi,et al. Learning sequential visual attention control through dynamic state space discretization , 2009, 2009 IEEE International Conference on Robotics and Automation.

[13] Peter Stone,et al. Learning Curriculum Policies for Reinforcement Learning , 2018, AAMAS.

[14] Zoran Popovic,et al. Efficient Bayesian Clustering for Reinforcement Learning , 2016, IJCAI.

[15] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[16] Alex Graves,et al. Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.

[17] Logan T. Trujillo. Mental Effort and Information-Processing Costs Are Inversely Related to Global Brain Free Energy During Visual Categorization , 2019, Front. Neurosci..

[18] M. N. Ahmadabadi,et al. Reward Maximization Justifies the Transition from Sensory Selection at Childhood to Sensory Integration at Adulthood , 2014, PloS one.

[19] Stefan Wermter,et al. Real-world reinforcement learning for autonomous humanoid robot docking , 2012, Robotics Auton. Syst..

[20] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[21] Ian D. Watson,et al. Applying reinforcement learning to small scale combat in the real-time strategy game StarCraft:Broodwar , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[22] Jürgen Schmidhuber,et al. PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[23] Thierson Couto,et al. An evolutionary approach for combining results of recommender systems techniques based on Collaborative Filtering , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[24] Pieter Abbeel,et al. Learning vehicular dynamics, with application to modeling helicopters , 2005, NIPS.

[25] Andreas Holzinger,et al. Interactive machine learning for health informatics: when do we need the human-in-the-loop? , 2016, Brain Informatics.

[26] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[27] Rachel W Jackson,et al. Human-in-the-loop optimization of exoskeleton assistance during walking , 2017, Science.

[28] Martha White,et al. Interval Estimation for Reinforcement-Learning Algorithms in Continuous-State Domains , 2010, NIPS.

[29] Karl J. Friston,et al. Bayesian model selection for group studies , 2009, NeuroImage.

[30] Vytautas Perlibakas,et al. Distance measures for PCA-based face recognition , 2004, Pattern Recognit. Lett..

[31] Michael L. Littman,et al. State Abstractions for Lifelong Reinforcement Learning , 2018, ICML.

[32] Jivko Sinapov,et al. Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey , 2020, J. Mach. Learn. Res..

[33] R. Bellman. A Markovian Decision Process , 1957 .

[34] Michael L. Littman,et al. Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.

[35] Daniel A. Braun,et al. Bounded Rational Decision-Making from Elementary Computations That Reduce Uncertainty , 2019, Entropy.

[36] Majid Nili Ahmadabadi,et al. Exploiting Generalization in the Subspaces for Faster Model-Based Reinforcement Learning , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[37] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[38] Daniel A. Braun,et al. Information, Utility and Bounded Rationality , 2011, AGI.

[39] I-Ming Chen,et al. Autonomous navigation of UAV by using real-time model-based reinforcement learning , 2016, 2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV).

[40] Jason Weston,et al. Dialogue Learning With Human-In-The-Loop , 2016, ICLR.

[41] E. Ordentlich,et al. Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .

[42] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[43] Daniel A. Braun,et al. Hierarchical Expert Networks for Meta-Learning , 2019, ArXiv.

[44] Daniel Polani,et al. Information Theory of Decisions and Actions , 2011 .

[45] Jordi Grau-Moya,et al. Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes , 2016, ECML/PKDD.

[46] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[47] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .