Discovering Multimodal Behavior in Ms. Pac-Man Through Evolution of Modular Neural Networks

Ms. Pac-Man is a challenging video game in which multiple modes of behavior are required: Ms. Pac-Man must escape ghosts when they are threats and catch them when they are edible, in addition to eating all pills in each level. Past approaches to learning behavior in Ms. Pac-Man have treated the game as a single task to be learned using monolithic policy representations. In contrast, this paper uses a framework called Modular Multiobjective NEAT (MM-NEAT) to evolve modular neural networks. Each module defines a separate behavior. The modules are used at different times according to a policy that can be human-designed (i.e., multitask) or discovered automatically by evolution. The appropriate number of modules can be fixed or discovered using a genetic operator called module mutation. Several versions of module mutation are evaluated in this paper. Both fixed modular networks and Module Mutation networks outperform monolithic networks and multitask networks. Interestingly, the best networks dedicate modules to critical behaviors (such as escaping when surrounded after luring ghosts near a power pill) that do not follow the customary division of the game into chasing edible and escaping threat ghosts. The results demonstrate that MM-NEAT can discover interesting and effective behavior for agents in challenging games.

[1]  Marco Wiering,et al.  Reinforcement learning to train Ms. Pac-Man using higher-order action-relative inputs , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[2]  Risto Miikkulainen,et al.  Evolving multimodal behavior with modular neural networks in Ms. Pac-Man , 2014, GECCO.

[3]  Samad Ahmadi,et al.  Reactive control of Ms. Pac Man using information retrieval based on Genetic Programming , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[4]  Ruck Thawonmas,et al.  Evolution strategy for optimizing parameters in Ms Pac-Man controller ICE Pambush 3 , 2010, Proceedings of the 2010 IEEE Conference on Computational Intelligence and Games.

[5]  Simon M. Lucas,et al.  Ms Pac-Man versus Ghost Team CEC 2011 competition , 2011, 2011 IEEE Congress of Evolutionary Computation (CEC).

[6]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[7]  Simon M. Lucas,et al.  Using a training camp with Genetic Programming to evolve Ms Pac-Man agents , 2011, 2011 IEEE Conference on Computational Intelligence and Games (CIG'11).

[8]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[9]  John Levine,et al.  Improving control through subsumption in the EvoTanks domain , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[10]  Julian Togelius,et al.  Hierarchical controller learning in a First-Person Shooter , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[11]  Mark H. M. Winands,et al.  Real-Time Monte Carlo Tree Search in Ms Pac-Man , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[12]  R. Thawonmas,et al.  Automatic Controller of Ms. Pac-Man and Its Performance: Winner of the IEEE CEC 2009 Software Agent Ms. Pac-Man Competition , 2009 .

[13]  Silvia Ferrari,et al.  A model-based cell decomposition approach to on-line pursuit-evasion path planning and the video game Ms. Pac-Man , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[14]  András Lörincz,et al.  Learning to Play Using Low-Complexity Rule-Based Policies: Illustrations through Ms. Pac-Man , 2007, J. Artif. Intell. Res..

[15]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[16]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[17]  Stéphane Doncieux,et al.  MENNAG: a modular, regular and hierarchical encoding for neural-networks based on attribute grammars , 2008, Evol. Intell..

[18]  U. Alon,et al.  Spontaneous evolution of modularity and network motifs. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Kenneth O. Stanley,et al.  Constraining connectivity to encourage modularity in HyperNEAT , 2011, GECCO '11.

[20]  Julian Togelius,et al.  Evolution of a subsumption architecture neurocontroller , 2004, J. Intell. Fuzzy Syst..

[21]  John DeNero,et al.  Teaching Introductory Artificial Intelligence with Pac-Man , 2010, Proceedings of the AAAI Conference on Artificial Intelligence.

[22]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[23]  Simon M. Lucas,et al.  A simple tree search method for playing Ms. Pac-Man , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[24]  Jacob Schrum,et al.  Evolving multimodal behavior through modular multiobjective neuroevolution , 2015 .

[25]  Simon M. Lucas,et al.  Evolving a Neural Network Location Evaluator to Play Ms. Pac-Man , 2005, CIG.

[26]  Marcus Gallagher,et al.  An influence map model for playing Ms. Pac-Man , 2008, 2008 IEEE Symposium On Computational Intelligence and Games.

[27]  Roderic A. Grupen,et al.  A feedback control structure for on-line learning tasks , 1997, Robotics Auton. Syst..

[28]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[29]  Simon M. Lucas,et al.  Evolving diverse Ms. Pac-Man playing agents using genetic programming , 2010, 2010 UK Workshop on Computational Intelligence (UKCI).

[30]  César Estébanez,et al.  AntBot: Ant Colonies for Video Games , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[31]  Manuela M. Veloso,et al.  Layered Learning , 2000, ECML.

[32]  Sebastian Thrun,et al.  Clustering Learning Tasks and the Selective Cross-Task Transfer of Knowledge , 1998, Learning to Learn.

[33]  K. Subramanian,et al.  Learning Options through Human Interaction , 2011 .

[34]  Stéphane Doncieux,et al.  Using behavioral exploration objectives to solve deceptive problems in neuro-evolution , 2009, GECCO.

[35]  Una-May O'Reilly,et al.  Genetic Programming II: Automatic Discovery of Reusable Programs. , 1994, Artificial Life.

[36]  H. Handa,et al.  Evolutionary fuzzy systems for generating better Ms.PacMan players , 2008, 2008 IEEE International Conference on Fuzzy Systems (IEEE World Congress on Computational Intelligence).

[37]  Takeshi Ito,et al.  Monte-Carlo tree search in Ms. Pac-Man , 2011, 2011 IEEE Conference on Computational Intelligence and Games (CIG'11).

[38]  Kristen Grauman,et al.  Learning with Whom to Share in Multi-task Feature Learning , 2011, ICML.

[39]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[40]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[41]  Simon M. Lucas,et al.  Evolution versus Temporal Difference Learning for learning to play Ms. Pac-Man , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[42]  Justinian Rosca,et al.  Generality versus size in genetic programming , 1996 .

[43]  Richard A. Watson,et al.  Reducing Local Optima in Single-Objective Problems by Multi-objectivization , 2001, EMO.

[44]  Jean-Baptiste Mouret,et al.  Evolving neural networks that are both modular and regular: HyperNEAT plus the connection cost technique , 2014, GECCO.

[45]  Risto Miikkulainen,et al.  Open-ended behavioral complexity for evolved virtual creatures , 2013, GECCO '13.

[46]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[47]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[48]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[49]  Risto Miikkulainen,et al.  Evolving agent behavior in multiobjective domains using fitness-based shaping , 2010, GECCO '10.

[50]  Vijay Kumar,et al.  A Framework and Architecture for Multirobot Coordination , 2000, International Symposium on Experimental Robotics.

[51]  Simon M. Lucas,et al.  Fast Approximate Max-n Monte Carlo Tree Search for Ms Pac-Man , 2011, IEEE Transactions on Computational Intelligence and AI in Games.

[52]  Hod Lipson,et al.  The evolutionary origins of modularity , 2012, Proceedings of the Royal Society B: Biological Sciences.

[53]  Simon M. Lucas,et al.  Using genetic programming to evolve heuristics for a Monte Carlo Tree Search Ms Pac-Man agent , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[54]  Risto Miikkulainen,et al.  Evolving Multimodal Networks for Multitask Games , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[55]  Stefano Nolfi,et al.  Duplication of Modules Facilitates the Evolution of Functional Specialization , 1999, Artificial Life.