Generalization of Agent Behavior through Explicit Representation of Context

In order to deploy autonomous agents in digital interactive environments, they must be able to act robustly in unseen situations. The standard machine learning approach is to include as much variation as possible into training these agents. The agents can then interpolate within their training, but they cannot extrapolate much beyond it. This paper proposes a principled approach where a context module is coevolved with a skill module in the game. The context module recognizes the temporal variation in the game and modulates the outputs of the skill module so that the action decisions can be made robustly even in previously unseen situations. The approach is evaluated in the Flappy Bird and LunarLander video games, as well as in the CARLA autonomous driving simulation. The Context+Skill approach leads to significantly more robust behavior in environments that require extrapolation beyond training. Such a principled generalization ability is essential in deploying autonomous agents in real-world tasks, and can serve as a foundation for continual adaptation as well.

[1]  Risto Miikkulainen,et al.  Evolving Adaptive Poker Players for Effective Opponent Exploitation , 2017, AAAI Workshops.

[2]  Risto Miikkulainen Subsymbolic Case-Role Analysis of Sentences with Embedded Clauses , 1993 .

[3]  Risto Miikkulainen,et al.  Opponent modeling and exploitation in poker using evolved recurrent neural networks , 2018, GECCO.

[4]  Risto Miikkulainen,et al.  Evolving multimodal behavior with modular neural networks in Ms. Pac-Man , 2014, GECCO.

[5]  Sebastian Thrun,et al.  Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[6]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  Rui Wang,et al.  Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions , 2019, ArXiv.

[9]  Risto Miikkulainen,et al.  Subsymbolic Case-Role Analysis of Sentences With Embedded Clauses , 1993, Cogn. Sci..

[10]  Sebastian Risi,et al.  Towards continual reinforcement learning through evolutionary meta-learning , 2019, GECCO.

[11]  Sebastian Risi,et al.  Automated Curriculum Learning by Rewarding Temporally Rare Events , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[12]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[13]  Marc Parizeau,et al.  DEAP: evolutionary algorithms made easy , 2012, J. Mach. Learn. Res..

[14]  Richard A. Watson,et al.  Reducing Local Optima in Single-Objective Problems by Multi-objectivization , 2001, EMO.

[15]  Tom Schaul,et al.  Meta-learning by the Baldwin effect , 2018, GECCO.

[16]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[17]  Peter Stone,et al.  Learning Curriculum Policies for Reinforcement Learning , 2018, AAMAS.

[18]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[19]  Jürgen Schmidhuber,et al.  PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[20]  Dileep George,et al.  Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics , 2017, ICML.

[21]  Aurélien Géron,et al.  Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems , 2017 .

[22]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.