Loop-Aware Memory Prefetching Using Code Block Working Sets
暂无分享,去创建一个
Uri C. Weiser | Shie Mannor | Yoav Etsion | Adi Fuchs | Shie Mannor | U. Weiser | Yoav Etsion | Adi Fuchs
[1] Janak H. Patel,et al. Stride directed prefetching in scalar processors , 1992, MICRO.
[2] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[3] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[4] Thomas F. Wenisch,et al. Spatio-temporal memory streaming , 2009, ISCA '09.
[5] Kei Hiraki,et al. Access Map Pattern Matching for High Performance Data Cache Prefetch , 2011, J. Instr. Level Parallelism.
[6] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[7] K.J. Nesbit,et al. AC/DC: an adaptive data cache prefetcher , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[8] Richard W. Vuduc,et al. When Prefetching Works, When It Doesn’t, and Why , 2012, TACO.
[9] Richard E. Kessler,et al. Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[10] Dean M. Tullsen,et al. Fast thread migration via cache working set prediction , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[11] Yale N. Patt,et al. A two-level approach to making class predictions , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.
[12] Thomas F. Wenisch,et al. Spatial Memory Streaming , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[13] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[14] Kathryn S. McKinley,et al. Guided region prefetching: a cooperative hardware/software approach , 2003, ISCA '03.
[15] Dean M. Tullsen,et al. Inter-core prefetching for multicore processors using migrating helper threads , 2011, ASPLOS XVI.
[16] Reena Panda,et al. B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors , 2012, IEEE Computer Architecture Letters.
[17] A. Jaleel. Memory Characterization of Workloads Using Instrumentation-Driven Simulation A Pin-based Memory Characterization of the SPEC CPU 2000 and SPEC CPU 2006 Benchmark Suites , 2022 .
[18] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[19] Hsien-Hsin S. Lee,et al. Data Prefetching by Exploiting Global and Local Access Patterns , 2011, J. Instr. Level Parallelism.
[20] Onur Mutlu,et al. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[21] Onur Mutlu,et al. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[22] James E. Smith,et al. Data Cache Prefetching Using a Global History Buffer , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[23] Vijayalakshmi Srinivasan,et al. RECAP: A region-based cure for the common cold (cache) , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[24] Glenn Reinman,et al. Fetch directed instruction prefetching , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[25] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[26] Onur Mutlu,et al. Address-value delta (AVD) prediction: increasing the effectiveness of runahead execution by exploiting regular memory allocation patterns , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[27] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[28] Mahmut T. Kandemir,et al. Application-aware prefetch prioritization in on-chip networks , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[29] Santosh G. Abraham,et al. Effective stream-based and execution-based data prefetching , 2004, ICS '04.
[30] Douglas J. Joseph,et al. Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[31] A Memo on Exploration of SPLASH-2 Input Sets , 2011 .
[32] Collin McCurdy,et al. Diagnosis and optimization of application prefetching performance , 2013, ICS '13.
[33] Onur Mutlu,et al. Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..