OpenTuner: An extensible framework for program autotuning

Program autotuning has been shown to achieve better or more portable performance in a number of domains. However, autotuners themselves are rarely portable between projects, for a number of reasons: using a domain-informed search space representation is critical to achieving good results; search spaces can be intractably large and require advanced machine learning techniques; and the landscape of search spaces can vary greatly between different problems, sometimes requiring domain specific search techniques to explore efficiently. This paper introduces OpenTuner, a new open source framework for building domain-specific multi-objective program autotuners. OpenTuner supports fully-customizable configuration representations, an extensible technique representation to allow for domain-specific techniques, and an easy to use interface for communicating with the program to be autotuned. A key capability inside OpenTuner is the use of ensembles of disparate search techniques simultaneously; techniques that perform well will dynamically be allocated a larger proportion of tests. We demonstrate the efficacy and generality of OpenTuner by building autotuners for 7 distinct projects and 16 total benchmarks, showing speedups over prior techniques of these projects of up to 2.8χ with little programmer effort.

[1]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[2]  Hans Werner Meuer,et al.  Top500 Supercomputer Sites , 1997 .

[3]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[4]  Gabor Karsai,et al.  An Approach to Self-adaptive Software Based on Supervisory Control , 2001, IWSAS.

[5]  I-Hsin Chung,et al.  Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[6]  Jack J. Dongarra,et al.  The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..

[7]  L. Almagor,et al.  Finding effective compilation sequences , 2004, LCTES '04.

[8]  Fangzhe Chang,et al.  A Framework for Automatic Adaptation of Tunable Distributed Applications , 2004, Cluster Computing.

[9]  José M. F. Moura,et al.  Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Alogorithms , 2004, Int. J. High Perform. Comput. Appl..

[10]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[11]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[12]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[13]  Nagarajan Kandasamy,et al.  Enabling Self-Managing Applications using Model-based Online Control Strategies , 2006, 2006 IEEE International Conference on Autonomic Computing.

[14]  Michael F. P. O'Boyle,et al.  MILEPOST GCC: machine learning based research compiler , 2008 .

[15]  Martin Rinard,et al.  Using Code Perforation to Improve Performance, Reduce Energy Consumption, and Respond to Failures , 2009 .

[16]  Alan Edelman,et al.  Autotuning multigrid with PetaBricks , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[17]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[18]  Michèle Sebag,et al.  Analyzing bandit-based adaptive operator selection mechanisms , 2010, Annals of Mathematics and Artificial Intelligence.

[19]  Henry Hoffmann,et al.  Application heartbeats: a generic interface for specifying program performance and goals in autonomous computing environments , 2010, ICAC '10.

[20]  Martin Rinard,et al.  Power-Aware Computing with Dynamic Knobs , 2010 .

[21]  Samuel Williams,et al.  An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[22]  Woongki Baek,et al.  Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.

[23]  Michèle Sebag,et al.  Comparison-Based Adaptive Strategy Selection with Bandits in Differential Evolution , 2010, PPSN.

[24]  Henry Hoffmann,et al.  Dynamic knobs for responsive power-aware computing , 2011, ASPLOS XVI.

[25]  Una-May O'Reilly,et al.  An efficient evolutionary algorithm for solving incrementally structured problems , 2011, GECCO '11.

[26]  Helmar Burkhart,et al.  PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[27]  Albert Cohen,et al.  Predictive modeling in a polyhedral optimization space , 2011, CGO 2011.

[28]  Thomas Fahringer,et al.  A multi-objective auto-tuning framework for parallel codes , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[29]  Una-May O'Reilly,et al.  Hyperparameter Tuning in Bandit-Based Adaptive Operator Selection , 2012, EvoApplications.

[30]  Frédo Durand,et al.  Decoupling algorithms from schedules for easy optimization of image processing pipelines , 2012, ACM Trans. Graph..

[31]  Shoaib Ashraf Kamil,et al.  Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages , 2012 .

[32]  Dr. Tom Murphy The First Level of Super Mario Bros . is Easy with Lexicographic Orderings and Time Travel , 2013 .

[33]  Frédo Durand,et al.  Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.