A pipelined-loop-compatible architecture and algorithm to reduce variable-length sets of floating-point data on a reconfigurable computer

Reconfigurable computers (RCs) combine general-purpose processors (GPPs) with field programmable gate arrays (FPGAs). The FPGAs are reconfigured at run time to become application-specific processors that collaborate with the GPPs to execute the application. High-level language (HLL) to hardware description language (HDL) compilers allow the FPGA-based kernels to be generated using HLL-based programming rather than HDL-based hardware design. Unfortunately, the loops needed for floating-point reduction operations often cannot be pipelined by these HLL-HDL compilers. This capability gap prevents the development of a number of important FPGA-based kernels. This article describes a novel architecture and algorithm that allow the use of an HLL-HDL environment to implement high-performance FPGA-based kernels that reduce multiple, variable-length sets of floating-point data. A sparse matrix iterative solver is used to demonstrate the effectiveness of the reduction kernel. The FPGA-augmented version running on a contemporary RC is up to 2.4 times faster than the software-only version of the same solver running on the GPP. Conservative estimates show the solver will run up to 6.3 times faster than software on a next-generation RC.

[1]  Viktor K. Prasanna,et al.  A Hybrid Approach for Mapping Conjugate Gradient onto an FPGA-Augmented Reconfigurable Supercomputer , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[2]  Tarek A. El-Ghazawi,et al.  Performance of sorting algorithms on the SRC 6 reconfigurable computer , 2005, Proceedings. 2005 IEEE International Conference on Field-Programmable Technology, 2005..

[3]  Viktor K. Prasanna,et al.  High Performance Linear Algebra Operations on Reconfigurable Systems , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[4]  Viktor K. Prasanna,et al.  Designing scalable FPGA-based reduction circuits using pipelined floating-point cores , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[5]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[6]  Miriam Leeser,et al.  Advanced Components in the Variable Precision Floating-Point Library , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[7]  Viktor K. Prasanna,et al.  A Library of Parameterizable Floating-Point Cores for FPGAs and Their Application to Scientific Computing , 2005, ERSA.

[8]  Yong Dou,et al.  64-bit floating-point FPGA matrix multiplication , 2005, FPGA '05.

[9]  Viktor K. Prasanna,et al.  An FPGA-based floating-point Jacobi iterative solver , 2005, 8th International Symposium on Parallel Architectures,Algorithms and Networks (ISPAN'05).

[10]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[11]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[12]  Keith D. Underwood,et al.  FPGAs vs. CPUs: trends in peak floating-point performance , 2004, FPGA '04.

[13]  Alok N. Choudhary,et al.  FPGA hardware synthesis from MATLAB , 2001, VLSI Design 2001. Fourteenth International Conference on VLSI Design.

[14]  Alex K. Jones,et al.  A MATLAB compiler for distributed, heterogeneous, reconfigurable computing systems , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[15]  Gerald Estrin,et al.  Organization of computer systems: the fixed plus variable structure computer , 1960, IRE-AIEE-ACM '60 (Western).

[16]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[17]  Jeffrey Hammes,et al.  A Transformational Approach to High Performance Embedded Computing , 2004 .

[18]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[19]  Viktor K. Prasanna,et al.  High-performance and area-efficient reduction circuits on FPGAs , 2005, 17th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'05).