Benchmarking the generalization capabilities of a compiling genetic programming system using sparse data sets

Compiling Genetic Programming Systems ('CPGS') are advanced evolutionary algorithms that directly evolve RISC machine code. In this paper we compare the ability of CGPS to generalize with that of other machine learning ('ML') paradigms. This study presents our results on three classification problems. Our study involved 720 complete CGPS runs of population 3000 each, over 500 billion fitness evaluations and 480 neural network runs as benchmarks. Our results were as follows: 1. When CGPS was trained on data sets that were not too sparse, CGPS performed very well, equaling the generalization capability of other ML systems quickly and consistently. 2. When CGPS was trained on very sparse data sets, CGPS produced individuals that generalized almost as well other ML systems trained on much larger data sets. 3. As between CGPS and multilayer feedforward neural networks trained on the same sparse data sets, CGPS generalized as well (and often better) than the neural network.