Flash: A GP-GPU Ensemble Learning System for Handling Large Datasets

The Flash system runs ensemble-based Genetic Programming GP symbolic regression on a shared memory desktop. To significantly reduce the high time cost of the extensive model predictions required by symbolic regression, its fitness evaluations are tasked to the desktop's GPU. Successive GP "instances" are run on different data subsets and randomly chosen objective functions. Best models are collected after a fixed number of generations and then fused with an adaptive, output-space method. New instance launches are halted once learning is complete. We demonstrate that Flash's ensemble strategy not only makes GP more robust, but it also provides an informed online means of halting the learning process. Flash enables GP to learn from a dataset composed of 370K exemplars and 90 features, evolving a population of 1000 individuals over 100 generations in as few as 50 seconds.

[1]  Yuhong Yang Adaptive Regression by Mixing , 2001 .

[2]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[3]  William B. Langdon,et al.  A SIMD Interpreter for Genetic Programming on GPU Graphics Cards , 2007, EuroGP.

[4]  Wolfgang Banzhaf,et al.  Implementing cartesian genetic programming classifiers on graphics processing units using GPU.NET , 2011, GECCO.

[5]  William B. Langdon A CUDA SIMT Interpreter for Genetic Programming , 2009 .

[6]  Wolfgang Banzhaf,et al.  Linear genetic programming GPGPU on Microsoft’s Xbox 360 , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[7]  Julian F. Miller,et al.  Genetic and Evolutionary Computation — GECCO 2003 , 2003, Lecture Notes in Computer Science.

[8]  Nicolas Lachiche,et al.  Fast Evaluation of GP Trees on GPGPU by Optimizing Hardware Scheduling , 2010, EuroGP.

[9]  Wolfgang Banzhaf,et al.  Accelerating evolutionary computation with graphics processing units , 2009, GECCO '09.

[10]  George D. Magoulas,et al.  Strategies to minimise the total run time of cyclic graph based genetic programming with GPUs , 2009, GECCO '09.

[11]  Wolfgang Banzhaf,et al.  Fast Genetic Programming on GPUs , 2007, EuroGP.

[12]  Kalyan Veeramachaneni,et al.  Learning regression ensembles with genetic programming at scale , 2013, GECCO '13.

[13]  Cyril Fonlupt,et al.  Population Parallel GP on the G80 GPU , 2008, EuroGP.

[14]  Wolfgang Banzhaf,et al.  Distributed genetic programming on GPUs using CUDA , 2011 .

[15]  Terence Soule,et al.  Genetic Programming Theory and Practice V , 2008 .

[16]  William B. Langdon,et al.  A Many Threaded CUDA Interpreter for Genetic Programming , 2010, EuroGP.

[17]  Wolfgang Banzhaf,et al.  Accelerating Genetic Programming through Graphics Processing Units. , 2009 .

[18]  Cyril Fonlupt,et al.  Genetic programming on graphics processing units , 2009, Genetic Programming and Evolvable Machines.

[19]  Mark Kotanchek,et al.  Trustable symbolic regression models: using ensembles, interval arithmetic and pareto fronts to develop robust and trust-aware models , 2008 .

[20]  Darren M. Chitty,et al.  A data parallel approach to genetic programming using programmable graphics hardware , 2007, GECCO '07.

[21]  Nicolas Lachiche,et al.  EASEA parallelization of tree-based Genetic Programming , 2010, IEEE Congress on Evolutionary Computation.