Efficient Floating-point Based Block LU Decomposition on FPGAs

In this paper, we propose an architecture for floatingpoint based LU decomposition for large-sized matrices. Our proposed architecture is based on the well known concept of blocking and uses pipelined floating-point units to obtain high throughput. We first analyze the effects of block size and the deeply pipelined floating-point units on the performance of the architecture. We analyze and compare the performance of our double-precision based design with that of a GPP based design. Initial results show that an improvement of upto 23x in the total computation time can be achieved. We then, analyze the impact of algorithm level design (by varying block size) on the system-wide energy dissipation and resource-usage of our designs. Categories: 1. Theory, Mapping and Parallelization and 4. Applications

[1]  Viktor K. Prasanna,et al.  A high-performance and energy-efficient architecture for floating-point based LU decomposition on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[2]  Viktor K. Prasanna,et al.  Domain-Specific Modeling for Rapid System-Wide Energy Estimation of Reconfigurable Architectures , 2002 .

[3]  Walter H. W. Tuttlebee Software Defined Radio , 2002 .

[4]  Walter Tuttlebee,et al.  Software defined radio : enabling technologies , 2002 .

[5]  Viktor K. Prasanna,et al.  Scalable and modular algorithms for floating-point matrix multiplication on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[6]  Viktor K. Prasanna,et al.  Time and Energy Efficient Matrix Factorization Using FPGAs , 2003, FPL.

[7]  Viktor K. Prasanna,et al.  Area and time efficient implementations of matrix multiplication on FPGAs , 2002, 2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings..