On-Chip Memory System Optimization Design for the FT64 Scientific Stream Accelerator

In this paper shows the extension of application domains, hardware-managed memory structures such as caches are drawing attention for dealing with irregular stream applications. However, since a real application usually has both regular and irregular stream characteristics, conventional stream register files, caches, or combinations thereof have shortcomings. This article focuses on combining software- and hardware-managed memory structures and presents a new syncretic memory system based on the ft64 stream accelerator.

[1]  J. Wawrzynek Spert-II : A Vector Micro Processore System, Special Issue of Neural Computing in , 1996 .

[2]  Peter Mattson,et al.  A programming system for the imagine media processor , 2002 .

[3]  William J. Dally,et al.  Evaluating the Imagine stream architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[4]  William J. Dally,et al.  Stream Scheduling: A Framework to Manage Bulk Operations in a Memory Hierarchy , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[5]  Christoforos E. Kozyrakis,et al.  Comparing memory systems for chip multiprocessors , 2007, ISCA '07.

[6]  David A. Patterson,et al.  Scalable Vector Media-processors for Embedded Systems , 2002 .

[7]  Yasuhiko Hagihara,et al.  A hardware overview of SX-6 and SX-7 supercomputer , 2003 .

[8]  Kathryn S. McKinley,et al.  Combining Cooperative Software / Hardware Prefetching and Cache Replacment , 2004 .

[9]  William J. Dally,et al.  Memory hierarchy design for stream computing , 2005 .

[10]  William J. Dally,et al.  Stream register files with indexed access , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[11]  Kevin Skadron,et al.  A hardware redundancy and recovery mechanism for reliable scientific computation on graphics processors , 2007, GH '07.

[12]  William J. Dally,et al.  Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.

[13]  William J. Dally,et al.  A bandwidth-efficient architecture for media processing , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[14]  Jung Ho Ahn,et al.  Merrimac: Supercomputing with Streams , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[15]  William J. Dally,et al.  Executing irregular scientific applications on stream architectures , 2007, ICS '07.

[16]  Ying Zhang,et al.  A 64-bit stream processor architecture for scientific applications , 2007, ISCA '07.

[17]  Jaehyuk Huh,et al.  TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP , 2004, TACO.

[18]  K. So,et al.  Cache performance of vector processors , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[19]  Brian Kingsbury,et al.  Spert-II: A Vector Microprocessor System , 1996, Computer.

[20]  W. Dally,et al.  Stream Scheduling , 2001 .

[21]  Wei Wu,et al.  Analysis and Performance Results of a fluid dynamics Application on MASA Stream Processor , 2006, 5th IEEE/ACIS International Conference on Computer and Information Science and 1st IEEE/ACIS International Workshop on Component-Based Software Engineering,Software Architecture and Reuse (ICIS-COMSAR'06).

[22]  Wilson C. Hsieh,et al.  Impulse: Memory system support for scientific applications , 1999, Sci. Program..

[23]  Steven K. Reinhardt,et al.  A fully associative software-managed cache design , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[24]  Samuel Williams,et al.  The potential of the cell processor for scientific computing , 2005, CF '06.