Robust non-linear control through neuroevolution

Many complex control problems require sophisticated solutions that are not amenable to traditional controller design. Not only is it difficult to model real world systems, but often it is unclear what kind of behavior is required to solve the task. Reinforcement learning approaches have made progress in such problems, but have so far not scaled well. Neuroevolution, has improved upon conventional reinforcement learning, but has still not been successful in full-scale, non-linear control problems. This dissertation develops a methodology for solving real world control tasks consisting of three components: (1) an efficient neuroevolution algorithm that solves difficult non-linear control tasks by coevolving neurons, (2) an incremental evolution method to scale the algorithm to the most challenging tasks, and (3) a technique for making controllers robust so that they can transfer from simulation to the real world. The method is faster than other approaches on a set of difficult learning benchmarks, and is used in two full-scale control tasks demonstrating its applicability to real world problems.

[1]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[2]  R. Giacconi,et al.  Evidence for x Rays From Sources Outside the Solar System , 1962 .

[3]  William R. Corliss,et al.  NASA Sounding Rockets, 1958-1968: A Historical Summary. NASA SP-4401 , 1971 .

[4]  James S. Albus,et al.  I A New Approach to Manipulator Control: The I Cerebellar Model Articulation Controller , 1975 .

[5]  James S. Albus,et al.  New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[6]  John H. Holland,et al.  Cognitive systems based on adaptive algorithms , 1977, SGAR.

[7]  John H. Holland,et al.  COGNITIVE SYSTEMS BASED ON ADAPTIVE ALGORITHMS1 , 1978 .

[8]  Donald A. Waterman,et al.  Pattern-Directed Inference Systems , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[10]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[11]  S. Grossberg,et al.  ART 2: self-organization of stable category recognition codes for analog input patterns. , 1987, Applied optics.

[12]  Terrence J. Sejnowski,et al.  A 'Neural' Network that Learns to Play Backgammon , 1987, NIPS.

[13]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[14]  Tariq Samad,et al.  Towards the Genetic Synthesisof Neural Networks , 1989, ICGA.

[15]  C.W. Anderson,et al.  Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.

[16]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[17]  Miomir Vukobratović,et al.  Biped Locomotion: Dynamics, Stability, Control and Application , 1990 .

[18]  Hiroaki Kitano,et al.  Designing Neural Networks Using Genetic Algorithms with Graph Generation System , 1990, Complex Syst..

[19]  Andrew G. Barto,et al.  Connectionist learning for control , 1990 .

[20]  Richard K. Belew,et al.  Evolving networks: using the genetic algorithm with connectionist learning , 1990 .

[21]  Rajarshi Das,et al.  Genetic reinforcement learning for neural networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[22]  Lashon B. Booker,et al.  Proceedings of the fourth international conference on Genetic algorithms , 1991 .

[23]  L. Darrell Whitley,et al.  Delta Coding: An Iterative Search Strategy for Genetic Algorithms , 1991, ICGA.

[24]  A. P. Wieland,et al.  Evolving neural network controllers for unstable systems , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[25]  Michael I. Jordan,et al.  Task Decomposition Through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks , 1990, Cogn. Sci..

[26]  Alan G. Schultz,et al.  Adapting the Evaluation Space to Improve Global Learning , 1991, ICGA.

[27]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[28]  Marco Colombetti,et al.  Robot shaping: developing situated agents through learning , 1992 .

[29]  Long Lin,et al.  Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .

[30]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[31]  H. J. Pesch,et al.  Complex differential games of pursuit-evasion type with state constraints, part 1: Necessary conditions for optimal open-loop strategies , 1993 .

[32]  Francesco Mondada,et al.  Mobile Robot Miniaturisation: A Tool for Investigation in Control Algorithms , 1993, ISER.

[33]  Nick Jakobi,et al.  Half-baked, Ad-hoc and Noisy: Minimal Simulations for Evolutionary Robotics , 1993 .

[34]  Xin Yao,et al.  A review of evolutionary artificial neural networks , 1993, Int. J. Intell. Syst..

[35]  Martin Mandischer,et al.  Representation and Evolution of Neural Networks , 1993 .

[36]  Eduardo D. Sontag,et al.  Neural Networks for Control , 1993 .

[37]  Johan A. K. Suykens,et al.  Stabilizing neural controllers: a case study for swinging up a double inverted pendulum , 1993 .

[38]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[39]  Craig W. Reynolds Evolution of obstacle avoidance behavior: using noise to promote robust solutions , 1994 .

[40]  H. J. Pesch Solving optimal control and pursuit-evasion game problems of high complexity , 1994 .

[41]  Jan Paredis,et al.  Steps towards Coevolutionary Classification Neural Networks , 1994 .

[42]  Randall D. Beer,et al.  Integrating reactive, sequential, and learning behavior using dynamical neural networks , 1994 .

[43]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[44]  Stefano Nolfi,et al.  How to Evolve Autonomous Robots: Different Approaches in Evolutionary Robotics , 1994 .

[45]  Craig W. Reynolds Competition, Coevolution and the Game of Tag , 1994 .

[46]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[47]  David E. Goldberg,et al.  Implicit Niching in a Learning Classifier System: Nature's Way , 1994, Evolutionary Computation.

[48]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[49]  David B. Fogel,et al.  Evolving Neural Control Systems , 1995, IEEE Expert.

[50]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[51]  Johan A. K. Suykens,et al.  Artificial neural networks for modelling and control of non-linear systems , 1995 .

[52]  Sebastian Thrun,et al.  Explanation-based neural network learning a lifelong learning approach , 1995 .

[53]  Stefano Nolfi,et al.  Evolving Mobile Robots in Simulated and Real Environments , 1995, Artificial Life.

[54]  Gurindar S. Sohi,et al.  Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[55]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[56]  Luigi M. Ricciardi,et al.  A Trace-Driven Simulator for Performance Evaluation of Cache-Based Multiprocessor Systems , 1995, IEEE Trans. Parallel Distributed Syst..

[57]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[58]  Richard S. Sutton,et al.  Connectionist Learning for Control , 1995 .

[59]  Inman Harvey,et al.  Noise and the Reality Gap: The Use of Simulation in Evolutionary Robotics , 1995, ECAL.

[60]  Jan Paredis,et al.  Coevolutionary Computation , 1995, Artificial Life.

[61]  Sandip Sen,et al.  Evolving Beharioral Strategies in Predators and Prey , 1995, Adaption and Learning in Multi-Agent Systems.

[62]  Stefano Nolfi,et al.  Learning to Adapt to Changing Environments in Evolving Neural Networks , 1996, Adapt. Behav..

[63]  Bruce A. Whitehead,et al.  Cooperative-competitive genetic evolution of radial basis function centers and widths for time series prediction , 1996, IEEE Trans. Neural Networks.

[64]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[65]  Larry D. Pyeatt,et al.  A comparison between cellular encoding and direct encoding for genetic neural networks , 1996 .

[66]  Man Ieee Systems,et al.  IEEE transactions on systems, man and cybernetics. Part B, Cybernetics , 1996 .

[67]  Risto Miikkulainen,et al.  Evolving Obstacle Avoidance Behavior in a Robot Arm , 1996 .

[68]  Samir W. Mahfoud Niching methods for genetic algorithms , 1996 .

[69]  Jordan B. Pollack,et al.  Coevolution of a Backgammon Player , 1996 .

[70]  Gillian M. Hayes,et al.  Robot Shaping --- Principles, Methods and Architectures , 1996 .

[71]  Paul J. Darwen,et al.  Co-Evolutionary Learning by Automatic Modularisation with Speciation , 1996 .

[72]  Francesco Mondada,et al.  Evolution of homing navigation in a real mobile robot , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[73]  Dave Cliff,et al.  Challenges in evolving controllers for physical robots , 1996, Robotics Auton. Syst..

[74]  Henrik Hautop,et al.  Sufficient Neurocontrollers can be Surprisingly Simple , 1996 .

[75]  Symbiotic Evolution of Neural Networks in Sequential Decision Tasks , 1997 .

[76]  Dimitris C. Dracopoulos,et al.  Evolutionary Learning Algorithms for Neural Adaptive Control , 1997, Perspectives in Neural Computing.

[77]  Marco Colombetti,et al.  Robot Shaping: An Experiment in Behavior Engineering , 1997 .

[78]  R. Eriksson,et al.  Cooperative Coevolution in Inventory Control Optimisation , 1997, ICANNGA.

[79]  Dimitris C. Dracopoulos Evolutionary learning algorithms for neural adaptive control , 1997, Perspectives in neural computing.

[80]  D. Magdic Limes: a multiprocessor simulation environment for PC platforms , 1997, 1997 21st International Conference on Microelectronics. Proceedings.

[81]  David E. Moriarty,et al.  Symbiotic Evolution of Neural Networks in Sequential Decision Tasks , 1997 .

[82]  Ashwin Ram,et al.  Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[83]  Richard K. Belew,et al.  Coevolutionary search among adversaries , 1997 .

[84]  Kunle Olukotun,et al.  A Single-Chip Multiprocessor , 1997, Computer.

[85]  Risto Miikkulainen,et al.  Incremental Evolution of Complex General Behavior , 1997, Adapt. Behav..

[86]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[87]  Sarita V. Adve,et al.  RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors , 1997 .

[88]  Vidroha Debroy,et al.  Genetic Programming , 1998, Lecture Notes in Computer Science.

[89]  Jean-Arcady Meyer,et al.  Incremental Evolution of Neural Controllers for Robust Obstacle-Avoidance in Khepera , 1998, EvoRobots.

[90]  Tom M. C. Smith,et al.  Blurred Vision: Simulation-Reality Transfer of a Visually Guided Robot , 1998, EvoRobot.

[91]  Nick Jakobi,et al.  Minimal simulations for evolutionary robotics , 1998 .

[92]  Lisa Meeden,et al.  Bridging The Gap Between Robot Simulations And Reality With Improved Models Of Sensor Noise , 1998 .

[93]  Stefano Nolfi,et al.  Competitive co-evolutionary robotics: from theory to practice , 1998 .

[94]  Jordan B. Pollack,et al.  Embodied evolution: embodying an evolutionary algorithm in a population of robots , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[95]  Soo-Won Kim,et al.  RAPTOR: a single chip multiprocessor , 1999, AP-ASIC'99. First IEEE Asia Pacific Conference on ASICs (Cat. No.99EX360).

[96]  Kee-Eung Kim,et al.  Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[97]  Keith Diefendorff,et al.  Power4 focuses on memory bandwidth , 1999 .

[98]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[99]  Steven Seidman,et al.  A synthesis of reinforcement learning and robust control theory , 2000 .

[100]  Vikas Agarwal,et al.  Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[101]  Kunle Olukotun,et al.  The Stanford Hydra CMP , 2000, IEEE Micro.

[102]  Zbigniew Michalewicz,et al.  Evolutionary Computation 2 , 2000 .

[103]  D. Floreano,et al.  Adaptive Behavior in Competing Co-Evolving Species , 2000 .

[104]  Mateo Valero,et al.  Multiple-banked register file architectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[105]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[106]  Andres S Perez-Bergquist Applying ESP and Region Specialists to Neuro-Evolution for Go , 2001 .

[107]  Risto Miikkulainen,et al.  COOPERATIVE COEVOLUTION OF MULTI-AGENT SYSTEMS , 2001 .

[108]  G. Seibert,et al.  A world without gravity , 2001 .

[109]  Alex Lubberts and Risto Miikkulainen Co-Evolving a Go-Playing Neural network , 2001 .

[110]  Gordon Wyeth,et al.  Online learning of autonomous helicopter control , 2002 .

[111]  Andrew W. Moore,et al.  Policy Search using Paired Comparisons , 2003, J. Mach. Learn. Res..

[112]  Risto Miikkulainen,et al.  Numerical optimization with neuroevolution , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[113]  William H. Press,et al.  Numerical recipes in C , 2002 .

[114]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[115]  Risto Miikkulainen,et al.  Evolving Keepaway Soccer Players through Task Decomposition , 2003, GECCO.

[116]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[117]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[118]  Gerald Tesauro,et al.  Practical issues in temporal difference learning , 1992, Machine Learning.

[119]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[120]  John H. Humphreys,et al.  The Vision Thing , 2004 .

[121]  Risto Miikkulainen,et al.  Efficient Reinforcement Learning through Symbiotic Evolution , 2004 .

[122]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[123]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[124]  Tao Xiong,et al.  A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..