Balancing Accuracy and Parsimony in Genetic Programming

Genetic programming is distinguished from other evolutionary algorithms in that it uses tree representations of variable size instead of linear strings of fixed length. The flexible representation scheme is very important because it allows the underlying structure of the data to be discovered automatically. One primary difficulty, however, is that the solutions may grow too big without any improvement of their generalization ability. In this article we investigate the fundamental relationship between the performance and complexity of the evolved structures. The essence of the parsimony problem is demonstrated empirically by analyzing error landscapes of programs evolved for neural network synthesis. We consider genetic programming as a statistical inference problem and apply the Bayesian model-comparison framework to introduce a class of fitness functions with error and complexity terms. An adaptive learning method is then presented that automatically balances the model-complexity factor to evolve parsimonious programs without losing the diversity of the population needed for achieving the desired training accuracy. The effectiveness of this approach is empirically shown on the induction of sigma-pi neural networks for solving a real-world medical diagnosis problem as well as benchmark tasks.

[1]  A. G. Ivakhnenko,et al.  Polynomial Theory of Complex Systems , 1971, IEEE Trans. Syst. Man Cybern..

[2]  Thomas Bäck,et al.  An Overview of Evolutionary Algorithms for Parameter Optimization , 1993, Evolutionary Computation.

[3]  David E. Rumelhart,et al.  Product Units: A Computationally Powerful and Biologically Plausible Extension to Backpropagation Networks , 1989, Neural Computation.

[4]  John R. Koza,et al.  Genetic programming 2 - automatic discovery of reusable programs , 1994, Complex Adaptive Systems.

[5]  Kenneth E. Kinnear,et al.  Generality and Difficulty in Genetic Programming: Evolving a Sort , 1993, ICGA.

[6]  Colin Giles,et al.  Learning, invariance, and generalization in high-order neural networks. , 1987, Applied optics.

[7]  Geoffrey E. Hinton,et al.  A general framework for parallel distributed processing , 1986 .

[8]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[9]  Jerome A. Feldman,et al.  Connectionist Models and Their Properties , 1982, Cogn. Sci..

[10]  Byoung-Tak Zhang,et al.  Effects of Occam's Razor in Evolving Sigma-Pi Neural Nets , 1994, PPSN.

[11]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[12]  Jerome A. Feldman,et al.  Connectionist Models and Their Properties , 1982, Cogn. Sci..

[13]  John R. Koza,et al.  Genetic programming 2 - automatic discovery of reusable programs , 1994, Complex adaptive systems.

[14]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[15]  Hitoshi Iba,et al.  Genetic programming using a minimum description length principle , 1994 .

[16]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[17]  Walter Alden Tackett,et al.  Genetic Programming for Feature Discovery and Image Discrimination , 1993, ICGA.

[18]  J. K. Kinnear,et al.  Advances in Genetic Programming , 1994 .

[19]  Peter J. Angeline,et al.  Competitive Environments Evolve Better Solutions for Complex Tasks , 1993, ICGA.

[20]  SchwefelHans-Paul,et al.  An overview of evolutionary algorithms for parameter optimization , 1993 .

[21]  Byoung-Tak Zhang,et al.  Synthesis of sigma-pi neural networks by the breeder genetic programming , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[22]  Byoung-Tak Zhang,et al.  Genetic Programming of Minimal Neural Nets Using Occam's Razor , 1993, ICGA.

[23]  Jaime G. Carbonell,et al.  Introduction: Paradigms for Machine Learning , 1989, Artif. Intell..

[24]  Shun-ichi Amari,et al.  Dualistic geometry of the manifold of higher-order neurons , 1991, Neural Networks.

[25]  H. M. Uhlenbein Evolutionary Algorithms: Theory and Applications , 1993 .

[26]  D. B. Fogel,et al.  AN INFORMATION CRITERION FOR OPTIMAL NEURAL NETWORK SELECTION , 1990, 1990 Conference Record Twenty-Fourth Asilomar Conference on Signals, Systems and Computers, 1990..

[27]  Hitoshi Iba,et al.  System Identification using Structured Genetic Algorithms , 1993, ICGA.

[28]  Byoung-Tak Zhang,et al.  Evolving Optimal Neural Networks Using Genetic Algorithms with Occam's Razor , 1993, Complex Syst..

[29]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[30]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[31]  John R. Koza,et al.  Simultaneous Discovery of Reusable Detectors and Subroutines Using Genetic Programming , 1993, ICGA.

[32]  John R. Koza Hierarchical Automatic Function Definition in Genetic Programming , 1992, FOGA.

[33]  Bernard Manderick,et al.  The Genetic Algorithm and the Structure of the Fitness Landscape , 1991, ICGA.

[34]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[35]  K. Kinnear Fitness landscapes and difficulty in genetic programming , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[36]  Patrick Henry Winston,et al.  Learning structural descriptions from examples , 1970 .

[37]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[38]  Heinz Mühlenbein,et al.  Predictive Models for the Breeder Genetic Algorithm I. Continuous Parameter Optimization , 1993, Evolutionary Computation.