An FPGA-Based Application-Specific Processor for Efficient Reduction of Multiple Variable-Length Floating-Point Data Sets

Reconfigurable computers (RCs) that combine generalpurpose processors with field-programmable gate arrays (FPGAs) are now available. In these exciting systems, the FPGAs become reconfigurable application-specific processors (ASPs). Specialized high-level language (HLL) to hardware description language (HDL) compilers allow these ASPs to be reconfigured using HLLs. In our research we describe a novel toroidal data structure and scheduling algorithm that allows us to use an HLL-to-HDL environment to implement a high-performance ASP that reduces multiple, variable-length sets of 64-bit floating-point data. We demonstrate the effectiveness of our ASP by using it to accelerate a sparse matrix iterative solver. We compare actual wall clock run times of a production-quality software iterative solver with an ASP-augmented version of the same solver on a current generation RC. Our ASP-augmented solver runs up to 2.4 times faster than software. Estimates show that this same design can run over 6.4 times faster on a next-generation RC.

[1]  Gerald Estrin,et al.  Organization of computer systems: the fixed plus variable structure computer , 1960, IRE-AIEE-ACM '60 (Western).

[2]  Youcef Saad,et al.  SPARSKIT: a basic tool kit for sparse matrix computations - Version 2 , 1990 .

[3]  Yvon Savaria,et al.  A flexible floating-point format for optimizing data-paths and operators in FPGA based DSPs , 2002, FPGA '02.

[4]  Viktor K. Prasanna,et al.  High-performance and area-efficient reduction circuits on FPGAs , 2005, 17th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'05).

[5]  Viktor K. Prasanna,et al.  An FPGA-based floating-point Jacobi iterative solver , 2005, 8th International Symposium on Parallel Architectures,Algorithms and Networks (ISPAN'05).

[6]  Wayne Luk,et al.  Automating custom-precision function evaluation for embedded processors , 2005, CASES '05.

[7]  Brent E. Nelson,et al.  Higher radix floating-point representations for FPGA-based arithmetic , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[8]  Viktor K. Prasanna,et al.  Designing scalable FPGA-based reduction circuits using pipelined floating-point cores , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[9]  Viktor K. Prasanna,et al.  A Hybrid Approach for Mapping Conjugate Gradient onto an FPGA-Augmented Reconfigurable Supercomputer , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[10]  Jeffrey Hammes,et al.  A Transformational Approach to High Performance Embedded Computing , 2004 .

[11]  Viktor K. Prasanna,et al.  High Performance Linear Algebra Operations on Reconfigurable Systems , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[12]  Liang-Gee Chen,et al.  JPEG, MPEG-4, and H.264 codec IP development , 2005, Design, Automation and Test in Europe.

[13]  Yong Dou,et al.  64-bit floating-point FPGA matrix multiplication , 2005, FPGA '05.

[14]  André DeHon,et al.  Floating-point sparse matrix-vector multiply for FPGAs , 2005, FPGA '05.

[15]  Viktor K. Prasanna,et al.  A Library of Parameterizable Floating-Point Cores for FPGAs and Their Application to Scientific Computing , 2005, ERSA.