Scaling Genetic Algorithms Using MapReduce

Genetic algorithms(GAs) are increasingly being applied to large scale problems. The traditional MPI-based parallel GAs require detailed knowledge about machine architecture. On the other hand, MapReduce is a powerful abstraction proposed by Google for making scalable and fault tolerant applications. In this paper, we show how genetic algorithms can be modeled into the MapReduce model. We describe the algorithm design and implementation of GAs on Hadoop, an open source implementation of MapReduce. Our experiments demonstrate the convergence and scalability up to 10^5 variable problems. Adding more resources would enable us to solve even larger problems without any changes in the algorithms and implementation since we do not introduce any performance bottlenecks.

[1]  Joel H. Saltz,et al.  A Performance Prediction Framework for Data Intensive Applications on Large Scale Parallel Machines , 1998, LCR.

[2]  Ian T. Foster,et al.  The virtual data grid: a new model and architecture for data-intensive collaboration , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[3]  K. De Jong,et al.  Selection pressure and performance in spatially distributed evolutionary algorithms , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[4]  Larry J. Eshelman,et al.  On Crossover as an Evolutionarily Viable Strategy , 1991, ICGA.

[5]  David E. Goldberg,et al.  The Design of Innovation: Lessons from and for Competent Genetic Algorithms , 2002 .

[6]  Kenneth A. De Jong,et al.  On Decentralizing Selection Algorithms , 1995, ICGA.

[7]  Xavier Llorà,et al.  Meandre: Semantic-Driven Data-Intensive Flows in the Clouds , 2008, 2008 IEEE Fourth International Conference on eScience.

[8]  Erick Cantú-Paz,et al.  Efficient and Accurate Parallel Genetic Algorithms , 2000, Genetic Algorithms and Evolutionary Computation.

[9]  Geoffrey C. Fox,et al.  MapReduce for Data Intensive Scientific Analyses , 2008, 2008 IEEE Fourth International Conference on eScience.

[10]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[11]  Joel H. Saltz,et al.  Design of a framework for data-intensive wide-area applications , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[12]  G. Harik Linkage Learning via Probabilistic Modeling in the ECGA , 1999 .

[13]  GhemawatSanjay,et al.  The Google file system , 2003 .

[14]  Xavier Llorà,et al.  Towards billion-bit optimization via a parallel estimation of distribution algorithm , 2007, GECCO '07.

[15]  Akihiko Konagaya,et al.  A Fine-Grained Parallel Genetic Algorithm for Distributed Parallel Systems , 1993, ICGA.

[16]  David E. Goldberg,et al.  The compact genetic algorithm , 1999, IEEE Trans. Evol. Comput..

[17]  Malcolm Sambridge,et al.  Genetic algorithms: a powerful tool for large-scale nonlinear optimization problems , 1994 .

[18]  Xavier Llorà Data-intensive computing for competent genetic algorithms: a pilot study using meandre , 2009, GECCO '09.

[19]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[20]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[21]  Olli Nevalainen,et al.  Genetic Algorithms for Large-Scale Clustering Problems , 1997, Comput. J..

[22]  Bu-Sung Lee,et al.  Efficient Hierarchical Parallel Genetic Algorithms using Grid computing , 2007, Future Gener. Comput. Syst..

[23]  Nenad Medvidovic,et al.  A software architecture-based framework for highly distributed and data intensive scientific applications , 2006, ICSE.

[24]  Hitoshi Iima,et al.  Application of genetic algorithm to a large-scale scheduling problem for a metal mold assembly process , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).

[25]  Gilbert Syswerda,et al.  Uniform Crossover in Genetic Algorithms , 1989, ICGA.

[26]  Erik D. Goodman,et al.  Coarse-grain parallel genetic algorithms: categorization and new approach , 1994, Proceedings of 1994 6th IEEE Symposium on Parallel and Distributed Processing.

[27]  Kalyanmoy Deb,et al.  Messy Genetic Algorithms: Motivation, Analysis, and First Results , 1989, Complex Syst..

[28]  Rajkumar Buyya,et al.  MRPGA: An Extension of MapReduce for Parallelizing Genetic Algorithms , 2008, 2008 IEEE Fourth International Conference on eScience.