An investigation of supervised learning in genetic programming

This thesis is an investigation into Supervised Learning (SL) in Genetic Programming (GP). With its exible tree-structured representation, GP is a type of Genetic Algorithm, using the Darwinian idea of natural selection and genetic recombination, evolving populations of solutions over many generations to solve problems. SL is a common approach in Machine Learning where the problem is presented as a set of examples. A good or t solution is one which can successfully deal with all of the examples. In common with most Machine Learning approaches, GP has been used to solve many trivial problems. When applied to larger and more complex problems, however, several di culties become apparent. When focusing on the basic features of GP, this thesis highlights the immense size of the GP search space, and describes an approach to measure this space. A stupendously exible but frustratingly useless representation, Anarchically Automatically De ned Functions, is described. Some di culties associated with the normal use of the GP operator Crossover (perhaps the most common method of combining GP trees to produce new trees) are demonstrated in the simple MAX problem. Crossover can lead to irreversible sub-optimal GP performance when used in combination with a restriction on tree size. There is a brief study of tournament selection which is a common method of selecting t individuals from a GP population to act as parents in the construction of the next generation. The main contributions of this thesis however are two approaches for avoiding the tness evaluation bottleneck resulting from the use of SL in GP. To establish the capability of a GP individual using SL, it must be tested or evaluated against each example in the set of training examples. Given that there can be a large set of training examples, a large population of individuals, and a large number of generations, before good solutions emerge, a very large number of evaluations must be carried out, often many tens of millions. This is by far the most time-consuming stage of the GP algorithm. Limited Error Fitness (LEF) and Dynamic Subset Selection (DSS) both reduce the number of evaluations needed by GP to successfully produce good solutions, adaptively using the capabilities of the current generation of individuals to guide the evaluation of the next generation. LEF curtails the tness evaluation of an individual after it exceeds an error limit, whereas DSS picks out a subset of examples from the training set for each generation. Whilst LEF allows GP to solve the comparatively small but di cult Boolean Even N parity problem for large N without the use of a more powerful representation such as Automatically De ned Functions, DSS in particular has been successful in improving the performance of GP across two large classi cation problems, allowing the use of smaller population sizes, many fewer and faster evaluations, and has more reliably produced as good or better solutions than GP on its own. The thesis ends with an assertion that smaller populations evolving over many generations can perform more consistently and produce better results than the `established' approach of using large populations over few generations. ii Acknowledgements I'd like to take this opportunity to thank the many people who have assisted, coerced, guided, bullied, ridiculed, encouraged, or otherwise contributed to my completing this thesis. Many thanks to my supervisor, Dr. Peter Ross, for knowing the answers to many questions. Many thanks to Dave Corne for lending an ear, and likewise to his bettergroomed replacement, Emma Hart. Many thanks to the attendees and organisers of the GP96 and GP97 conferences for many inspiring conversations and talks, and especially to Bill Langdon for his many helpful comments. Many thanks to the denizens of E17 and E19 for the endlessly diverting chats. Many thanks to many other people. And last but not least, many thanks to SERC, who became EPSRC, for funding nearly the whole of my PhD with grant number 93314680. iii Declaration I hereby declare that I composed this thesis entirely myself and that it describes my own research. C. S. Gathercole Edinburgh March 22, 1998 iv

[1]  John R. Koza,et al.  Genetic Programming II , 1992 .

[2]  Peter Nordin,et al.  A compiling genetic programming system that directly manipulates the machine-code , 1994 .

[3]  Walter Alden Tackett,et al.  Genetic Programming for Feature Discovery and Image Discrimination , 1993, ICGA.

[4]  J. K. Kinnear,et al.  Advances in Genetic Programming , 1994 .

[5]  Peter Ross,et al.  Co-evolution of Operator Settings in Genetic Algorithms , 1996, Evolutionary Computing, AISB Workshop.

[6]  Roger L. Wainwright,et al.  Solving facility layout problems using genetic programming , 1996 .

[7]  Kenneth E. Kinnear,et al.  Generality and Difficulty in Genetic Programming: Evolving a Sort , 1993, ICGA.

[8]  Timothy Perkis,et al.  Stack-based genetic programming , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[9]  Byoung-Tak Zhang,et al.  Balancing Accuracy and Parsimony in Genetic Programming , 1995, Evolutionary Computation.

[10]  WB Langdon Evolving Data Structures Using Genetic Programming , 1995 .

[11]  Peter J. Angeline,et al.  Type Inheritance in Strongly Typed Genetic Programming , 1996 .

[12]  N. Pierce Origin of Species , 1914, Nature.

[13]  Peter Ross,et al.  Small Populations over Many Generations can beat Large Populations over Few Generations in Genetic P , 1997 .

[14]  Tim Jones Evolutionary Algorithms, Fitness Landscapes and Search , 1995 .

[15]  Larry D. Pyeatt,et al.  A comparison between cellular encoding and direct encoding for genetic neural networks , 1996 .

[16]  John R. Koza,et al.  Parallel genetic programming: a scalable implementation using the transputer network architecture , 1996 .

[17]  Steven A. Vere Genetic Classification Trees , 1995 .

[18]  Patrik D'haeseleer,et al.  Context preserving crossover in genetic programming , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[19]  Eugene H. Spafford,et al.  Evolving event-driven programs , 1996 .

[20]  Thomas Bäck,et al.  Evolutionary computation: Toward a new philosophy of machine intelligence , 1997, Complex..

[21]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[22]  Conor Ryan,et al.  Paragen: a novel technique for the autoparallelisation of sequential programs using GP , 1996 .

[23]  Peter J. Angeline,et al.  Two self-adaptive crossover operators for genetic programming , 1996 .

[24]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[25]  Justinian Rosca,et al.  Generality versus size in genetic programming , 1996 .

[26]  W. Daniel Hillis,et al.  Co-evolving parasites improve simulated evolution as an optimization procedure , 1990 .

[27]  P. Ross,et al.  An adverse interaction between crossover and restricted tree depth in genetic programming , 1996 .

[28]  Justinian P. Rosca,et al.  Hierarchical Self-Organization in Genetic programming , 1994, ICML.

[29]  F. Oppacher,et al.  Hybridized crossover-based search techniques for program discovery , 1995, Proceedings of 1995 IEEE International Conference on Evolutionary Computation.

[30]  Peter Ross,et al.  Tackling the Boolean Even N Parity Problem with Genetic Programming and Limited-Error Fitness , 1997 .

[31]  K. Deb,et al.  Rapid, Accurate Optimization of Diicult Problems Using Fast Messy Genetic Algorithms Rapid, Accurate Optimization of Diicult Problems Using Fast Messy Genetic Algorithms , 1993 .

[32]  Peter J. Angeline,et al.  Competitive Environments Evolve Better Solutions for Complex Tasks , 1993, ICGA.

[33]  Peter Ross,et al.  Cost Based Operator Rate Adaption: An Investigation , 1996, PPSN.

[34]  John R. Koza,et al.  Genetic programming 1997 : proceedings of the Second Annual Conference, July 13-16, 1997, Stanford University , 1997 .

[35]  Peter Ross,et al.  Dynamic Training Subset Selection for Supervised Learning in Genetic Programming , 1994, PPSN.

[36]  Terence Soule,et al.  Code growth in genetic programming , 1996 .

[37]  Sholom M. Weiss,et al.  An Empirical Comparison of Pattern Recognition, Neural Nets, and Machine Learning Classification Methods , 1989, IJCAI.

[38]  Conor Ryan,et al.  Pygmies and civil servants , 1994 .

[39]  Nichael Lynn Cramer,et al.  A Representation for the Adaptive Generation of Simple Sequential Programs , 1985, ICGA.

[40]  John R. Koza Genetic programming : proceedings of the first annual conference, 1996 , 1996 .

[41]  John J. Grefenstette,et al.  Genetic algorithms and their applications , 1987 .

[42]  Simon Handley,et al.  On the use of a directed acyclic graph to represent a population of computer programs , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[43]  Una-May O'Reilly,et al.  An Experimental Perspective on Genetic Programming , 1992, PPSN.

[44]  Larry J. Eshelman,et al.  Spurious Correlations and Premature Convergence in Genetic Algorithms , 1990, FOGA.

[45]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[46]  Kalyanmoy Deb,et al.  Messy Genetic Algorithms: Motivation, Analysis, and First Results , 1989, Complex Syst..

[47]  Peter Nordin,et al.  Benchmarking the generalization capabilities of a compiling genetic programming system using sparse data sets , 1996 .

[48]  Kevin J. Lang Hill Climbing Beats Genetic Search on a Boolean Circuit Synthesis Problem of Koza's , 1995, ICML.

[49]  Una-May O'Reilly,et al.  The Troubling Aspects of a Building Block Hypothesis for Genetic Programming , 1994, FOGA.

[50]  Brian Howley Genetic programming of near-minimum-time spacecraft attitude maneuvers , 1996 .

[51]  Una-May O'Reilly,et al.  Program Search with a Hierarchical Variable Lenght Representation: Genetic Programming, Simulated Annealing and Hill Climbing , 1994, PPSN.

[52]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[53]  Michael J. Shaw,et al.  Genetic algorithms with dynamic niche sharing for multimodal function optimization , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[54]  T. Ray,et al.  Evolving Multicellular Artificial Life , 1994 .

[55]  Peter Ross,et al.  Some Training Subset Selection Methods for Supervised Learning in Genetic Programming , 1994 .

[56]  Maarten Keijzer,et al.  Efficiently representing populations in genetic programming , 1996 .

[57]  Paul R. Cohen,et al.  Learning monitoring strategies: a difficult genetic programming application , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[58]  P. Angeline An Investigation into the Sensitivity of Genetic Programming to the Frequency of Leaf Selection Duri , 1996 .

[59]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[60]  Geoffrey I. Webb Further Experimental Evidence against the Utility of Occam's Razor , 1996, J. Artif. Intell. Res..

[61]  Martin C. Martin,et al.  Genetic programming in C++: implementation issues , 1994 .

[62]  A. Tuson,et al.  The Single Chromosome's Guide to Dating , 1997, ICANNGA.

[63]  Peter Ross,et al.  Comparing Genetic Algorithms, Simulated Annealing, and Stochastic Hillclimbing on Timetabling Problems , 1995, Evolutionary Computing, AISB Workshop.