Energy- and time-efficient matrix multiplication on FPGAs

We develop new algorithms and architectures for matrix multiplication on configurable devices. These have reduced energy dissipation and latency compared with the state-of-the-art field-programmable gate array (FPGA)-based designs. By profiling well-known designs, we identify "energy hot spots", which are responsible for most of the energy dissipation. Based on this, we develop algorithms and architectures that offer tradeoffs among the number of I/O ports, the number of registers, and the number of PEs. To avoid time-consuming low-level simulations for energy profiling and performance prediction of many alternate designs, we derive functions to represent the impact of algorithm design choices on the system-wide energy dissipation, area, and latency. These functions are used to either optimize the energy performance or provide tradeoffs for a family of candidate algorithms and architectures. For selected designs, we perform extensive low-level simulations using state-of-the-art tools and target FPGA devices. We show a design space for matrix multiplication on FPGAs that results in tradeoffs among energy, area, and latency. For example, our designs improve the energy performance of state-of-the-art FPGA-based designs by 29%-51% without any increase in the area-latency product. The latency of our designs is reduced one-third to one-fifteenth while area is increased 1.9-9.4 times. In terms of comprehensive metrics such as Energy-Area-Time, our designs exhibit superior performance compared with the state-of-the-art by 50%-79%.

[1]  Viktor K. Prasanna,et al.  On Synthesizing Optimal Family of Linear Systolic Arrays for Matrix Multiplication , 1991, IEEE Trans. Computers.

[2]  Ramachandran Vaidyanathan,et al.  Adaptive image filtering using run-time reconfiguration , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[3]  Bevan M. Baas,et al.  A low-power, high-performance, 1024-point FFT processor , 1999, IEEE J. Solid State Circuits.

[4]  Viktor K. Prasanna,et al.  Energy-Efficient Matrix Multiplication on FPGAs , 2002, FPL.

[5]  R. John Linear Statistical Models: An Applied Approach , 1986 .

[6]  P. J. Green,et al.  Probability and Statistical Inference , 1978 .

[7]  Abbes Amira,et al.  Accelerating Matrix Product on Reconfigurable Hardware for Signal Processing , 2001, FPL.

[8]  Viktor K. Prasanna,et al.  Energy-efficient and parameterized designs for fast Fourier transform on FPGAs , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  Viktor K. Prasanna,et al.  Domain-Specific Modeling for Rapid Energy Estimation of Reconfigurable Architectures , 2004, The Journal of Supercomputing.

[10]  Wayne Luk,et al.  Image Registration of Real-Time Broadcast Video Using the UltraSONIC Reconfigurable Computer , 2002, FPL.

[11]  Tarek A. El-Ghazawi,et al.  Performance and overhead in a hybrid reconfigurable computer , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[12]  Viktor K. Prasanna,et al.  Energy-efficient signal processing using FPGAs , 2003, FPGA '03.

[13]  Bruce A. Draper,et al.  High-Level Language Abstraction for Reconfigurable Computing , 2003, Computer.

[14]  Luca Benini,et al.  Regression-based RTL power modeling , 2000, TODE.

[15]  Viktor K. Prasanna,et al.  Energy-Efficient Discrete Cosine Transform on FPGAs , 2003, Engineering of Reconfigurable Systems and Algorithms.

[16]  M. Kendall Probability and Statistical Inference , 1956, Nature.

[17]  Michael J. Flynn,et al.  PAM-Blox: high performance FPGA design for adaptive computing , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[18]  Li Shang,et al.  Dynamic power consumption in Virtex™-II FPGA family , 2002, FPGA '02.

[19]  Neil W. Bergmann,et al.  Reconfigurable Computing in Remote and Harsh Environments , 1999, FPL.

[20]  H. T. Kung,et al.  Systolic Arrays for (VLSI). , 1978 .

[21]  Dominique Lavenier,et al.  Evaluation of the streams-C C-to-FPGA compiler: an applications perspective , 2001, FPGA '01.

[22]  Jürgen Becker,et al.  DReAM: A Dynamically Reconfigurable Architecture for Future Mobile Communications Applications , 2000, FPL.