Optimizing Matrix Multiplication on Heterogeneous Reconfigurable Systems

With the rapid advances in technology, FPGAs have become an attractive option for acceleration of scientific applications. In particular, reconfigurable computing systems have been built which combine FPGAs and general-purpose processors to achieve high performance. Previous work assumes the nodes in such systems are homogeneous, containing both processors and FPGAs. However, in reality, the nodes can be heterogeneous, based on either FPGAs, processors, or both. In this paper, we model these heterogeneous reconfigurable systems using various parameters, including the computing capacities of the nodes, the size of memory, the memory bandwidth, and the network bandwidth. Based on the model, we propose a design for matrix multiplication that fully utilizes the computing capacity of a system and adapts to various heterogeneous settings. To illustrate our ideas, the proposed design is implemented on Cray XD1. Heterogeneous nodes are generated by using only the FPGAs or the processors in some nodes. Experimental results show that our design achieves up to 80% of the total computing capacity of the system and more than 90% of the performance predicted by the model.

[1]  Soonhoi Ha,et al.  A hardware-software cosynthesis technique based on heterogeneous multiprocessor scheduling , 1999, CODES '99.

[2]  Viktor K. Prasanna,et al.  Scalable hybrid designs for linear algebra on reconfigurable computing systems , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).

[3]  Viktor K. Prasanna,et al.  A Library of Parameterizable Floating-Point Cores for FPGAs and Their Application to Scientific Computing , 2005, ERSA.

[4]  Howard Jay Siegel,et al.  Techniques for mapping tasks to machines in heterogeneous computing systems , 2000, J. Syst. Archit..

[5]  Yacine Atif,et al.  Dynamic scheduling techniques for heterogeneous computing systems , 1995, Concurr. Pract. Exp..

[6]  Viktor K. Prasanna,et al.  Scalable and modular algorithms for floating-point matrix multiplication on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[7]  Niraj K. Jha,et al.  COSYN: Hardware-software co-synthesis of heterogeneous distributed embedded systems , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[8]  Viktor K. Prasanna,et al.  Hardware/Software Approach to Molecular Dynamics on Reconfigurable Computers , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[9]  Tarek A. El-Ghazawi,et al.  Applications of Heterogeneous Computing in Hardware/Software Co-Scheduling , 2007, 2007 IEEE/ACS International Conference on Computer Systems and Applications.