Blind Optimization for Exploiting Hardware Features

Software systems typically exploit only a small fraction of the realizable performance from the underlying microprocessors. While there has been much work on hardware-aware optimizations, two factors limit their benefit. First, microprocessors are so complex that it is unlikely that even an aggressively optimizing compiler will be able to satisfy all the constraints necessary to obtain the best performance. Thus, most optimizations use a simplified model of the hardware (e.g., they may be cache-aware but they may ignore other hardware structures, such as TLBs, etc.). Second, hardware manufacturers do not reveal all details of their microprocessors so even if the authors of optimizations wanted to simultaneously optimize for all components of the hardware, they may be unable to do so because they are working with limited knowledge. This paper presents and evaluates our blind optimization approach which provides a way to get around these issues. Blind optimization uses the insight that we can generate many variants of an application by altering semantic preserving parameters of an application; for example our variants can cover the space of code and data layout by shifting the positions of code and data in memory. Our optimization strategy attempts to find a variant that performs well with respect to an optimization objective. We show that even our first implementation of blind optimization speeds up a number of programs from the SPECint 2006 benchmark suite.

[1]  Rudolf Eigenmann,et al.  Fast and effective orchestration of compiler optimizations for automatic performance tuning , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[2]  Brad Calder,et al.  Efficient procedure mapping using cache line coloring , 1997, PLDI '97.

[3]  Andrew G. Barto,et al.  Building a Basic Block Instruction Scheduler with Reinforcement Learning and Rollouts , 2002, Machine Learning.

[4]  Matthias Hauswirth,et al.  Vertical profiling: understanding the behavior of object-priented applications , 2004, OOPSLA.

[5]  Michael F. P. O'Boyle,et al.  Automatic Tuning of Inlining Heuristics , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[6]  Michael D. Smith,et al.  Procedure placement using temporal-ordering information , 1999, TOPL.

[7]  Atanas Rountev,et al.  Off-line variable substitution for scaling points-to analysis , 2000, PLDI '00.

[8]  Matthew Arnold,et al.  Adaptive optimization in the Jalapeno JVM , 2000, SIGP.

[9]  Brad Calder,et al.  Online performance auditing: using hot optimizations without getting burned , 2006, PLDI '06.

[10]  Hiroyuki Tomiyama,et al.  Code placement techniques for cache miss rate reduction , 1997, TODE.

[11]  David I. August,et al.  Compiler optimization-space exploration , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[12]  Matthew Arnold,et al.  Online feedback-directed optimization of Java , 2002, OOPSLA '02.

[13]  Henry Massalin Superoptimizer: a look at the smallest program , 1987, ASPLOS 1987.

[14]  Scott McFarling,et al.  Procedure merging with instruction caches , 1991, PLDI '91.

[15]  Daniel A. Jiménez,et al.  Code placement for improving dynamic branch prediction accuracy , 2005, PLDI '05.

[16]  Jack W. Davidson,et al.  Profile guided code positioning , 1990, SIGP.

[17]  Keith D. Cooper,et al.  Adaptive Optimizing Compilers for the 21st Century , 2002, The Journal of Supercomputing.

[18]  Scott McFarling,et al.  Program optimization for instruction caches , 1989, ASPLOS III.

[19]  Jack J. Dongarra,et al.  A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[20]  Matthias Hauswirth,et al.  Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.

[21]  Gavin Brown,et al.  Intelligent selection of application-specific garbage collectors , 2007, ISMM '07.

[22]  Amer Diwan,et al.  Understanding the behavior of compiler optimizations , 2006, Softw. Pract. Exp..

[23]  John Cavazos,et al.  Inducing heuristics to decide whether to schedule , 2004, PLDI '04.

[24]  Dirk Grunwald,et al.  Reducing branch costs via branch alignment , 1994, ASPLOS VI.

[25]  Matthew Arnold,et al.  Adaptive optimization in the Jalapeño JVM , 2000, OOPSLA '00.