Bring Your Own Learner: A Cloud-Based, Data-Parallel Commons for Machine Learning

We introduce FCUBE, a cloud-based framework that enables machine learning researchers to contribute their learners to its community-shared repository. FCUBE exploits data parallelism in lieu of algorithmic parallelization to allow its users to efficiently tackle large data problems automatically. It passes random subsets of data generated via resampling to multiple learners that it executes simultaneously and then it combines their model predictions with a simple fusion technique. It is an example of what we have named a Bring Your Own Learner model. It allows multiple machine learning researchers to contribute algorithms in a plug-and-play style. We contend that the Bring Your Own Learner model signals a design shift in cloud-based machine learning infrastructure because it is capable of executing anyone's supervised machine learning algorithm. We demonstrate FCUBE executing five different learners contributed by three different machine learning groups on a 100 node deployment on Amazon EC2. They collectively solve a publicly available classification problem trained with 11 million exemplars from the Higgs dataset.

[1]  Jaume Bacardit,et al.  GAssist vs. BioHEL: critical assessment of two paradigms of genetics-based machine learning , 2013, Soft Comput..

[2]  Federico Divina,et al.  Contact map prediction using a large-scale ensemble of rule sets and the fusion of multiple predicted structural features , 2012, Bioinform..

[3]  Siddhartha Bhattacharyya,et al.  Genetic programming in classifying large-scale data: an ensemble method , 2004, Inf. Sci..

[4]  P. Baldi,et al.  Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[5]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[6]  DebK.,et al.  A fast and elitist multiobjective genetic algorithm , 2002 .

[7]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[8]  Marc Parizeau,et al.  DEAP: evolutionary algorithms made easy , 2012, J. Mach. Learn. Res..

[9]  Malcolm I. Heywood,et al.  Benchmarking pareto archiving heuristics in the presence of concept drift: diversity versus age , 2013, GECCO '13.

[10]  Xavier Llorà,et al.  Large‐scale data mining using genetics‐based machine learning , 2013, GECCO.

[11]  Alex Alves Freitas,et al.  Hierarchical classification of protein function with ensembles of rules and particle swarm optimisation , 2008, Soft Comput..

[12]  Jaume Bacardit,et al.  Smart crossover operator with multiple parents for a Pittsburgh learning classifier system , 2006, GECCO '06.

[13]  Kalyan Veeramachaneni,et al.  Building Multiclass Nonlinear Classifiers with GPUs , 2013 .

[14]  Malcolm I. Heywood,et al.  Managing team-based problem solving with symbiotic bid-based genetic programming , 2008, GECCO '08.

[15]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[16]  E. Vladislavleva Model-based problem solving through symbolic regression via pareto genetic programming , 2008 .

[17]  Pavlos Protopapas,et al.  Computational Intelligence Challenges and Applications on Large-Scale Astronomical Time Series Databases , 2014, IEEE Computational Intelligence Magazine.

[18]  Andrew R. McIntyre,et al.  Symbiotic coevolutionary genetic programming: a benchmarking study under large attribute spaces , 2012, Genetic Programming and Evolvable Machines.

[19]  Tim Kraska,et al.  MLbase: A Distributed Machine-learning System , 2013, CIDR.

[20]  Zbigniew Michalewicz,et al.  Benchmarking Optimization Algorithms: An Open Source Framework for the Traveling Salesman Problem , 2014, IEEE Computational Intelligence Magazine.

[21]  Marc Schoenauer,et al.  Take It EASEA , 2000, PPSN.