Efficient architectural design space exploration via predictive modeling

Efficiently exploring exponential-size architectural design spaces with many interacting parameters remains an open problem: the sheer number of experiments required renders detailed simulation intractable. We attack this via an automated approach that builds accurate predictive models. We simulate sampled points, using results to teach our models the function describing relationships among design parameters. The models can be queried and are very fast, enabling efficient design tradeoff discovery. We validate our approach via two uniprocessor sensitivity studies, predicting IPC with only 1--2% error. In an experimental study using the approach, training on 1% of a 250-K-point CMP design space allows our models to predict performance with only 4--5% error. Our predictive modeling combines well with techniques that reduce the time taken by each simulation experiment, achieving net time savings of three-four orders of magnitude.

[1]  David M. Brooks,et al.  Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.

[2]  Margaret Martonosi,et al.  NSF Computer Performance Evaluation Workshop: Summary and Action Items , 2002 .

[3]  Rich Caruana,et al.  Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.

[4]  Marianne Winslett,et al.  Automatic and portable performance modeling for parallel I/O: a machine-learning approach , 2002, PERV.

[5]  Kapil Vaswani,et al.  A Predictive Performance Model for Superscalar Processors , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[6]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[7]  Norman P. Jouppi,et al.  CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.

[8]  Hsien-Hsin S. Lee,et al.  Constructing a Non-Linear Model with Neural Networks for Workload Characterization , 2006, 2006 IEEE International Symposium on Workload Characterization.

[9]  Thomas M. Conte,et al.  Reducing state loss for effective trace sampling of superscalar processors , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.

[10]  Bruce Jacob,et al.  A Case for Studying DRAM Issues at the System Level , 2003, IEEE Micro.

[11]  Kunle Olukotun,et al.  Maximizing CMP throughput with mediocre cores , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[12]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[13]  C. Marzban A Neural Network for Tornado Diagnosis: Managing Local Minima , 2000, Neural Computing & Applications.

[14]  Anish Muttreja,et al.  Automated Energy/Performance Macromodeling of Embedded Software , 2004, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[15]  Kevin Skadron,et al.  Memory reference reuse latency: Accelerated warmup for sampled microarchitecture simulation , 2003, 2003 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS 2003..

[16]  John Paul Shen,et al.  Theoretical modeling of superscalar processor performance , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[17]  A. J. KleinOsowski,et al.  MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research , 2002, IEEE Computer Architecture Letters.

[18]  Foster J. Provost,et al.  Active Learning for Class Probability Estimation and Ranking , 2001, IJCAI.

[19]  Yuan Xie,et al.  Intel Corporation , 2005 .

[20]  James E. Smith,et al.  A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[21]  Foster J. Provost,et al.  Active Sampling for Class Probability Estimation and Ranking , 2004, Machine Learning.

[22]  Sally A. McKee,et al.  Efficiently exploring architectural design spaces via predictive modeling , 2006, ASPLOS XII.

[23]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[24]  Edward S. Davidson,et al.  Computer system design using a hierarchical approach to performance evaluation , 1980, CACM.

[25]  Thomas F. Wenisch,et al.  TurboSMARTS: accurate microarchitecture simulation sampling in minutes , 2005, SIGMETRICS '05.

[26]  Lieven Eeckhout,et al.  Measuring Program Similarity: Experiments with SPEC CPU Benchmark Suites , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[27]  Dean M. Tullsen,et al.  Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[28]  Stijn Eyerman,et al.  The shape of the processor design space and its implications for early stage explorations , 2005 .

[29]  Pradeep Dubey,et al.  Platform 2015: Intel ® Processor and Platform Evolution for the Next Decade , 2005 .

[30]  Roland E. Wunderlich,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[31]  Louise Trevillyan,et al.  Representative traces for processor models with infinite cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[32]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[33]  Anish Muttreja,et al.  Hybrid simulation for embedded software energy estimation , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[34]  Frederic T. Chong,et al.  HLS: combining statistical and symbolic simulation to guide microprocessor designs , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[35]  Dean A. Pomerleau,et al.  Knowledge-Based Training of Artificial Neural Networks for Autonomous Robot Driving , 1993 .

[36]  Kapil Vaswani,et al.  Construction and use of linear regression models for processor performance analysis , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[37]  Mikko H. Lipasti,et al.  A performance methodology for commercial servers , 2000, IBM J. Res. Dev..

[38]  Kevin Skadron,et al.  CMP design space exploration subject to physical constraints , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[39]  Caren Marzban,et al.  A neural network for tornado diagnosis , 2000 .

[40]  Lieven Eeckhout,et al.  Quantifying the Impact of Input Data Sets on Program Behavior and its Applications , 2003, J. Instr. Level Parallelism.

[41]  Lieven Eeckhout,et al.  Efficient Sampling Startup for Sampled Processor Simulation , 2005, HiPEAC.

[42]  Stijn Eyerman,et al.  Efficient Design Space Exploration of High Performance Embedded Out-of-Order Processors , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[43]  Kevin Skadron,et al.  Minimal subset evaluation: rapid warm-up for simulated hardware state , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[44]  J. P. Bigus Applying neural networks to computer system performance tuning , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[45]  Kevin Skadron,et al.  Memory Reference Reuse Latency: Accelerated Sampled Microarchitecture Simulation , 2002 .

[46]  Douglas M. Hawkins,et al.  A statistically rigorous approach for improving simulation methodology , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[47]  Lieven Eeckhout,et al.  Control flow modeling in statistical simulation for accurate and efficient processor design studies , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[48]  Lieven Eeckhout,et al.  BLRL: Accurate and Efficient Warmup for Sampled Processor Simulation , 2005, Comput. J..

[49]  Sally A. McKee,et al.  An Approach to Performance Prediction for Parallel Applications , 2005, Euro-Par.

[50]  James E. Smith,et al.  Statistical Simulation: Adding Efficiency to the Computer Designer's Toolbox , 2003, IEEE Micro.

[51]  Rastislav Bodík,et al.  Interaction cost and shotgun profiling , 2004, TACO.