FT64: Scientific Computing with Streams

This paper describes FT64 and Multi-FT64, single- and multicoprocessor systems designed for high performance scientific computing with streams. We give a detailed case study of porting the Mersenne Prime Search problem to FT64 and Multi-FT64 systems. We discuss several special problems associated with streamizing, such as kernel processing granularity, stream organization and workload partitioning for a multi-processor, which are generally applicable to other scientific codes on FT64. Finally, we perform experiments with eight typical scientific applications on FT64. The results show that a 500MHz FT64 achieves over 50% of its peak performance and a 4.2x peak speedup over 1.6GHz Itanium2. An eight processor Multi-FT64 system achieves 6.8x peak speedup over a single FT64.

[1]  Henry Hoffmann,et al.  Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[2]  William J. Dally,et al.  Imagine: Media Processing with Streams , 2001, IEEE Micro.

[3]  Wei Wu,et al.  Analysis and Performance Results of a fluid dynamics Application on MASA Stream Processor , 2006, 5th IEEE/ACIS International Conference on Computer and Information Science and 1st IEEE/ACIS International Workshop on Component-Based Software Engineering,Software Architecture and Reuse (ICIS-COMSAR'06).

[4]  Samuel Williams,et al.  The potential of the cell processor for scientific computing , 2005, CF '06.

[5]  Ronald T. Williams,et al.  RT_STAP: Real-Time Space-Time Adaptive Processing Benchmark , 1997 .

[6]  Katherine Yelick,et al.  SCALABLE PROCESSORS IN THE BILLION-TRANSISTOR THE BILLION-TRANSISTOR ERA :IRAM , 1997 .

[7]  William J. Dally,et al.  Analysis and Performance Results of a Molecular Modeling Application on Merrimac , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[8]  Ying Zhang,et al.  A 64-bit stream processor architecture for scientific applications , 2007, ISCA '07.

[9]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture , 2003, IEEE Micro.

[10]  C. A. R. Hoare,et al.  Communicating sequential processes , 1978, CACM.

[11]  Jung Ho Ahn,et al.  Merrimac: Supercomputing with Streams , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[12]  Kozyrakis Scalabel Processors in the Billion-Transisteor Era: IRAM , 1997 .