Sample, Estimate, Tune: Scaling Bayesian Auto-Tuning of Data Science Pipelines

In this paper, we describe a system for sequential hyperparameter optimization that scales to work with complex pipelines and large datasets. Currently, the state-of-the-art in hyperparameter optimization improves on randomized and grid search by using sequential Bayesian optimization to explore the space of hyperparameters in a more informed way. These methods, however, are not scalable, as the entire data science pipeline still must be evaluated on all the data. By designing a sub sampling based approach to estimate pipeline performance, along with a distributed evaluation system, we provide a scalable solution, which we illustrate using complex image and text data pipelines. For three pipelines, we show that we are able to gain similar performance improvements, but by computing on substantially less data.

[1]  Ian Dewancker,et al.  Evaluation System for a Bayesian Optimization Service , 2016, ArXiv.

[2]  Jasper Snoek,et al.  Input Warping for Bayesian Optimization of Non-Stationary Functions , 2014, ICML.

[3]  Søren Nymand Lophaven,et al.  DACE - A Matlab Kriging Toolbox , 2002 .

[4]  Yang Yuan,et al.  Hyperparameter Optimization: A Spectral Approach , 2017, ICLR.

[5]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[6]  E. Ziegel,et al.  Bootstrapping: A Nonparametric Approach to Statistical Inference , 1993 .

[7]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[8]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .

[9]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[12]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[13]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Effects of Random Sampling on SVM Hyper-parameter Tuning , 2016, ISDA.

[14]  B. L. Welch THE SIGNIFICANCE OF THE DIFFERENCE BETWEEN TWO MEANS WHEN THE POPULATION VARIANCES ARE UNEQUAL , 1938 .

[15]  Emmanuelle Gouillart,et al.  scikit-image: image processing in Python , 2014, PeerJ.

[16]  Robert T. McGibbon,et al.  Osprey: Hyperparameter Optimization for Machine Learning , 2016, J. Open Source Softw..

[17]  Randal S. Olson,et al.  Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science , 2016, GECCO.

[18]  Ruben Martinez-Cantin,et al.  BayesOpt: a Bayesian optimization library for nonlinear optimization, experimental design and bandits , 2014, J. Mach. Learn. Res..

[19]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20]  Chris Eliasmith,et al.  Hyperopt-Sklearn: Automatic Hyperparameter Configuration for Scikit-Learn , 2014, SciPy.

[21]  Bart De Moor,et al.  Hyperparameter tuning in Python using Optunity , 2014 .

[22]  Michael M. McKerns,et al.  Building a Framework for Predictive Science , 2012, SciPy.

[23]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[24]  David D. Cox,et al.  Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.

[25]  John Shalf,et al.  SEJITS: Getting Productivity and Performance With Selective Embedded JIT Specialization , 2010 .

[26]  Matei A. Zaharia An Architecture for and Fast and General Data Processing on Large Clusters , 2013 .

[27]  Kevin Leyton-Brown,et al.  Auto-WEKA: Automated Selection and Hyper-Parameter Optimization of Classification Algorithms , 2012, ArXiv.

[28]  Purnamrita Sarkar,et al.  A scalable bootstrap for massive data , 2011, 1112.5016.

[29]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[30]  Andrew Gordon Wilson,et al.  Copula Processes , 2010, NIPS.

[31]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[32]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[33]  Jo Hardin Trusting the Black Box: Confidence with Bag of Little Bootstraps , 2015 .

[34]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[35]  Nando de Freitas,et al.  Bayesian Optimization in a Billion Dimensions via Random Embeddings , 2013, J. Artif. Intell. Res..

[36]  Katharina Eggensperger,et al.  Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters , 2013 .

[37]  Shoaib Ashraf Kamil,et al.  Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages , 2012 .

[38]  Carl E. Rasmussen,et al.  Warped Gaussian Processes , 2003, NIPS.

[39]  Frank Hutter,et al.  Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves , 2015, IJCAI.

[40]  Bernd Bischl,et al.  mlr: Machine Learning in R , 2016, J. Mach. Learn. Res..

[41]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[42]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[43]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[44]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[45]  Achille Fokoue,et al.  An effective algorithm for hyperparameter optimization of neural networks , 2017, IBM J. Res. Dev..

[46]  Davide Anguita,et al.  Model Selection for Big Data: Algorithmic Stability and Bag of Little Bootstraps on GPUs , 2015, ESANN.