Evolutionary computation versus reinforcement learning

Many applications of reinforcement learning (RL) and evolutionary computation (EC) are addressing the same problem, namely, to maximize some agent's fitness function in a potentially unknown environment. The most challenging open issues in such applications include partial observability of the agent's environment, hierarchical and other types of abstract credit assignment, and the learning of credit assignment algorithms. I summarize why EC provides a more natural framework for addressing these issues than RL based on value functions and dynamic programming. Then I point out fundamental drawbacks of traditional EC methods in case of stochastic environments, stochastic policies, and unknown temporal delays between actions and observable effects. I discuss a remedy called the success-story algorithm which combines aspects of RL and EC.

[1]  Richard S. Sutton,et al.  TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[2]  Pattie Maes,et al.  Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments , 1996 .

[3]  Thomas S. Ray,et al.  An Approach to the Synthesis of Life , 1991 .

[4]  Corso Elvezia A General Method for Incremental Self-improvement and Multi-agent Learning in Unrestricted Environments , 1996 .

[5]  Rafal Salustowicz,et al.  Probabilistic Incremental Program Evolution , 1997, Evolutionary Computation.

[6]  Corso Elvezia Discovering Solutions with Low Kolmogorov Complexity and High Generalization Capability , 1995 .

[7]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[8]  Balaraman Ravindran,et al.  Improved Switching among Temporally Abstract Actions , 1998, NIPS.

[9]  Mark B. Ring Learning Sequential Tasks by Incrementally Adding Higher Orders , 1992, NIPS.

[10]  Corso Elvezia Probabilistic Incremental Program Evolution , 1997 .

[11]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[12]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[13]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[14]  Ray J. Solomonoff,et al.  The Application of Algorithmic Probability to Problems in Artificial Intelligence , 1985, UAI.

[15]  Maja J. Matarić,et al.  Action Selection methods using Reinforcement Learning , 1996 .

[16]  Jürgen Schmidhuber,et al.  Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[17]  Ingo Rechenberg,et al.  Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[18]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[19]  Wolfgang Banzhaf,et al.  Genetic Programming: An Introduction , 1997 .

[20]  Jieyu Zhao,et al.  Simple Principles of Metalearning , 1996 .

[21]  Jürgen Schmidhuber,et al.  HQ-Learning , 1997, Adapt. Behav..

[22]  Michael Kearns,et al.  Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.

[23]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[24]  Andrew McCallum,et al.  Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[25]  Satinder Singh The Ecient Learning of Multiple Task Sequences , 1992 .

[26]  Maja J. Matarić,et al.  Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[27]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[28]  Chen K. Tham,et al.  Reinforcement learning of multiple tasks using a hierarchical CMAC architecture , 1995, Robotics Auton. Syst..

[29]  Jieyu Zhao,et al.  Direct Policy Search and Uncertain Policy Evaluation , 1998 .

[30]  Jürgen Schmidhuber,et al.  Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.

[31]  Douglas B. Lenat,et al.  Theory Formation by Heuristic Search , 1983, Artificial Intelligence.

[32]  Mark Humphreys,et al.  Action selection methods using reinforcement learning , 1997 .

[33]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences: statistical considerations , 1969, JACM.

[34]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[35]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[36]  Juergen Schmidhuber,et al.  A General Method For Incremental Self-Improvement And Multi-Agent Learning In Unrestricted Environme , 1999 .

[37]  Astro Teller,et al.  The evolution of mental models , 1994 .

[38]  Nichael Lynn Cramer,et al.  A Representation for the Adaptive Generation of Simple Sequential Programs , 1985, ICGA.

[39]  Ron Sun,et al.  Self-segmentation of sequences: automatic formation of hierarchies of sequential behaviors , 2000, IEEE Trans. Syst. Man Cybern. Part B.

[40]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[41]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[42]  Jürgen Schmidhuber,et al.  Learning to generate sub-goals for action sequences , 1991 .

[43]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[44]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[45]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[46]  Jürgen Schmidhuber,et al.  Solving POMDPs with Levin Search and EIRA , 1996, ICML.