论文信息 - Time and Energy Efficient Matrix Factorization Using FPGAs

Time and Energy Efficient Matrix Factorization Using FPGAs

In this paper, new algorithms and architectures for matrix factorization are presented. Two fully-parallel and block-based designs for LU decomposition on configurable devices are proposed. A linear array architecture is employed to minimize the usage of long interconnects, leading to lower energy dissipation. The designs are made scalable by using a fixed I/O bandwidth independent of the problem size. High level models for energy profiling are built and the energy performance of many possible designs is predicted. Through the analysis of design tradeoffs, the block size that minimizes the total energy dissipation is identified. A set of candidate designs was implemented on the Xilinx Virtex-II to verify the estimates. Also, the performance of our designs is compared with that of state-of-the-art DSP based designs and with the performance of designs obtained using a state-of-the-art commercial compilation tool such as Celoxica DK1. Our designs on the FPGAs are significantly more time and energy efficient in both cases.

Viktor K. Prasanna | Seonil B. Choi

[1] Jaeyoung Choi,et al. Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines , 1994, Sci. Program..

[2] Wayne Luk,et al. Customising graphics applications: techniques and programming interface , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[3] Viktor K. Prasanna,et al. Energy-efficient signal processing using FPGAs , 2003, FPGA '03.

[4] Jürgen Becker,et al. DReAM: A Dynamically Reconfigurable Architecture for Future Mobile Communications Applications , 2000, FPL.

[5] Li Shang,et al. Dynamic power consumption in Virtex™-II FPGA family , 2002, FPGA '02.

[6] Walter Tuttlebee,et al. Software defined radio : enabling technologies , 2002 .

[7] S. Haykin,et al. Adaptive Filter Theory , 1986 .

[8] Viktor K. Prasanna,et al. Domain-Specific Modeling for Rapid System-Wide Energy Estimation of Reconfigurable Architectures , 2002 .

[9] Emmanuel Casseau,et al. A linear systolic array for LU decomposition , 1994, Proceedings of 7th International Conference on VLSI Design.

[10] Peter M. Athanas,et al. Quantitative analysis of floating point arithmetic on FPGA based custom computing machines , 1995, Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.