Autotuning algorithmic choice for input sensitivity

A daunting challenge faced by program performance autotuning is input sensitivity, where the best autotuned configuration may vary with different input sets. This paper presents a novel two-level input learning algorithm to tackle the challenge for an important class of autotuning problems, algorithmic autotuning. The new approach uses a two-level input clustering method to automatically refine input grouping, feature selection, and classifier construction. Its design solves a series of open issues that are particularly essential to algorithmic autotuning, including the enormous optimization space, complex influence by deep input features, high cost in feature extraction, and variable accuracy of algorithmic choices. Experimental results show that the new solution yields up to a 3x speedup over using a single configuration for all inputs, and a 34x speedup over a traditional one-level method for addressing input sensitivity in program optimizations.

[1]  Fangzhe Chang,et al.  A Framework for Automatic Adaptation of Tunable Distributed Applications , 2004, Cluster Computing.

[2]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[3]  David A. Padua,et al.  Optimizing sorting with genetic algorithms , 2005, International Symposium on Code Generation and Optimization.

[4]  Brian N. Bershad,et al.  Fast, effective dynamic compilation , 1996, PLDI '96.

[5]  Scott A. Mahlke,et al.  Adaptive input-aware compilation for graphics engines , 2012, PLDI '12.

[6]  J. Mixter Fast , 2012 .

[7]  Nancy M. Amato,et al.  A framework for adaptive algorithm selection in STAPL , 2005, PPoPP.

[8]  Una-May O'Reilly,et al.  An efficient evolutionary algorithm for solving incrementally structured problems , 2011, GECCO '11.

[9]  José M. F. Moura,et al.  Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Alogorithms , 2004, Int. J. High Perform. Comput. Appl..

[10]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[11]  Henry Hoffmann,et al.  Dynamic knobs for responsive power-aware computing , 2011, ASPLOS XVI.

[12]  Nagarajan Kandasamy,et al.  Enabling Self-Managing Applications using Model-based Online Control Strategies , 2006, 2006 IEEE International Conference on Autonomic Computing.

[13]  Richard W. Vuduc,et al.  Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..

[14]  I-Hsin Chung,et al.  Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[15]  Lieven Eeckhout,et al.  Iterative optimization for the data center , 2012, ASPLOS XVII.

[16]  Gabor Karsai,et al.  An Approach to Self-adaptive Software Based on Supervisory Control , 2001, IWSAS.

[17]  Xipeng Shen,et al.  A cross-input adaptive framework for GPU program optimizations , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[18]  Lieven Eeckhout,et al.  Evaluating iterative optimization across 1000 datasets , 2010, PLDI '10.

[19]  Michael Voss,et al.  High-level adaptive program optimization with ADAPT , 2001, PPoPP '01.

[20]  Alan Edelman,et al.  Language and compiler support for auto-tuning variable-accuracy algorithms , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[21]  Martin Rinard,et al.  Power-Aware Computing with Dynamic Knobs , 2010 .

[22]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[23]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[24]  Cédric Bastoul,et al.  Predictive Modeling in a Polyhedral Optimization Space , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[25]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[26]  Michael Garland,et al.  Nitro: A Framework for Adaptive Code Variant Tuning , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[27]  James Demmel,et al.  Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.

[28]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[29]  José Nelson Amaral,et al.  Workload Reduction for Multi-input Feedback-Directed Optimization , 2009, 2009 International Symposium on Code Generation and Optimization.

[30]  Santosh Pande,et al.  Brainy: effective selection of data structures , 2011, PLDI '11.

[31]  Michael F. P. O'Boyle,et al.  MILEPOST GCC: machine learning based research compiler , 2008 .

[32]  Alexander Aiken,et al.  Data representation synthesis , 2011, PLDI '11.

[33]  Michael Voss,et al.  ADAPT: Automated De-coupled Adaptive Program Transformation , 2000, Proceedings 2000 International Conference on Parallel Processing.

[34]  Workload Reduction for Multi-Input Profile-Directed Optimization , 2009 .

[35]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[36]  L. Almagor,et al.  Finding effective compilation sequences , 2004, LCTES '04.

[37]  Xipeng Shen,et al.  An input-centric paradigm for program dynamic optimizations , 2010, OOPSLA.

[38]  Huawei Li,et al.  Performance Portability Across Heterogeneous SoCs Using a Generalized Library-Based Approach , 2014, TACO.

[39]  Jack Dongarra,et al.  Special Issue on Program Generation, Optimization, and Platform Adaptation , 2005, Proc. IEEE.

[40]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[41]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[42]  Woongki Baek,et al.  Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.

[43]  Katherine A. Yelick,et al.  Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY , 2001, International Conference on Computational Science.

[44]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[45]  Henry Hoffmann,et al.  Application heartbeats: a generic interface for specifying program performance and goals in autonomous computing environments , 2010, ICAC '10.

[46]  Martin C. Rinard,et al.  Dynamic feedback: an effective technique for adaptive computing , 1997, PLDI '97.

[47]  Martin Rinard,et al.  Using Code Perforation to Improve Performance, Reduce Energy Consumption, and Respond to Failures , 2009 .

[48]  Albert Cohen,et al.  Quick and Practical Run-Time Evaluation of Multiple Program Optimizations , 2007, Trans. High Perform. Embed. Archit. Compil..

[49]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..