论文信息 - Area and time efficient implementations of matrix multiplication on FPGAs

Area and time efficient implementations of matrix multiplication on FPGAs

We develop new algorithms and architectures for matrix multiplication on configurable hardware. These designs significantly reduce the latency as well as the area. Our designs improve the previous designs in terms of the area/speed metric where the speed denotes the maximum achievable running frequency. The area/speed metrics for the previous designs and our design are 14.45, 4.93, and 2.35, respectively, for 4 /spl times/ 4 matrix multiplication. The latency of one of the previous design is 0.57 /spl mu/s, while our design takes 0.15 /spl mu/s using 18% less area. The area of our designs is smaller by 11% - 46% compared with the best known systolic designs with the same latency for the matrices of sizes 3 /spl times/ 3 - 12 /spl times/ 12. The performance improvements tend to grow with the problem size.

Viktor K. Prasanna | Ju-wook Jang | Seonil B. Choi

[1] Michael J. Flynn,et al. PAM-Blox: high performance FPGA design for adaptive computing , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[2] Luca Benini,et al. Regression-based RTL power modeling , 2000, TODE.

[3] Abbes Amira,et al. Accelerating Matrix Product on Reconfigurable Hardware for Signal Processing , 2001, FPL.

[4] Bevan M. Baas,et al. A low-power, high-performance, 1024-point FFT processor , 1999, IEEE J. Solid State Circuits.

[5] Wayne Luk,et al. A Reconfigurable Engine for Real-Time Video Processing , 1998, FPL.

[6] Viktor K. Prasanna,et al. Domain-Specific Modeling for Rapid System-Wide Energy Estimation of Reconfigurable Architectures , 2002 .

[7] Viktor K. Prasanna,et al. On Synthesizing Optimal Family of Linear Systolic Arrays for Matrix Multiplication , 1991, IEEE Trans. Computers.

[8] Viktor K. Prasanna,et al. Energy-Efficient Matrix Multiplication on FPGAs , 2002, FPL.