Cache streamization for high performance stream processor

Due to high bandwidth demand on memory system of stream applications, most of stream processors use software-managed streaming memory. However, this memory disadvantages ease of programming, compatibility, and supporting irregular stream access, which hinder the usage of stream processor in broader application domains. Meanwhile, hardware-managed coherent caches overcome these shortcomings of software-managed streaming memory with side-effect due to lack of supporting stream. For this problem, this paper developed a streamization cache whose performance is comparable to streaming memory but is more easy to use. The paper presents the motivation and details of our proposed design, including three stream-specific techniques for cache on data fetch policy, replacement policy and multi-client access. Moreover, a streamization cache instance is implemented in FT64, a 64-bit high performance stream processor. Based on a set of streaming application benchmark, the paper estimates the performance, power consumption and the area cost of the proposed architecture. Results show that these streamization techniques for cache are worthwhile.

[1]  William J. Dally,et al.  Comparing Reyes and OpenGL on a stream architecture , 2002, HWWS '02.

[2]  Christoforos E. Kozyrakis,et al.  Comparative evaluation of memory models for chip multiprocessors , 2008, TACO.

[3]  Janak H. Patel,et al.  Stride directed prefetching in scalar processors , 1992, MICRO.

[4]  Erik Brunvand,et al.  Impulse: building a smarter memory controller , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[5]  William J. Dally,et al.  Executing irregular scientific applications on stream architectures , 2007, ICS '07.

[6]  Samuel Williams,et al.  The potential of the cell processor for scientific computing , 2005, CF '06.

[7]  Wei Wu,et al.  Analysis and Performance Results of a fluid dynamics Application on MASA Stream Processor , 2006, 5th IEEE/ACIS International Conference on Computer and Information Science and 1st IEEE/ACIS International Workshop on Component-Based Software Engineering,Software Architecture and Reuse (ICIS-COMSAR'06).

[8]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[9]  Lieven Eeckhout,et al.  Pattern-driven prefetching for multimedia applications on embedded processors , 2006, J. Syst. Archit..

[10]  Alan Jay Smith,et al.  Cache performance for multimedia applications , 2001, ICS '01.

[11]  Wei Wu,et al.  On-Chip Memory System Optimization Design for the FT64 Scientific Stream Accelerator , 2008, IEEE Micro.

[12]  William J. Dally,et al.  Analysis and Performance Results of a Molecular Modeling Application on Merrimac , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[13]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[14]  Chia-Lin Yang,et al.  Annotated Memory References: A Mechanism for Informed Cache Management , 1999, Euro-Par.

[15]  Peter Mattson,et al.  A programming system for the imagine media processor , 2002 .

[16]  Soonhoi Ha,et al.  Memory access pattern analysis and stream cache design for multimedia applications , 2003, ASP-DAC '03.

[17]  William J. Dally,et al.  Memory hierarchy design for stream computing , 2005 .

[18]  Wei Wu,et al.  FT64: Scientific Computing with Streams , 2007, HiPC.

[19]  William J. Dally,et al.  Stream register files with indexed access , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[20]  William J. Dally,et al.  Polygon rendering on a stream architecture , 2000, Workshop on Graphics Hardware.

[21]  Jung Ho Ahn,et al.  Merrimac: Supercomputing with Streams , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[22]  Edward A. Lee,et al.  Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.

[23]  Arnold L. Rosenberg,et al.  Using the compiler to improve cache replacement decisions , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.