A message-passing multi-softcore architecture on FPGA for Breadth-first Search

Breadth-first Search (BFS) is a fundamental graph problem. Due to the irregular nature of memory accesses to graph data structures, parallelization of BFS on cache-based systems leads to poor performance. Many issues, such as memory access latency, cache coherence policy, and inter-process synchronization, affect the throughput performance of BFS on such systems. In our proposed message-passing multi-softcore architecture, parallelization is achieved by exchanging information among autonomous softcores on FPGA. Several optimizations are performed to reduce the traffic on the interconnect and to enable designs with high clock rates. Implementations on a state of the art FPGA achieve clock rates in excess of 100 MHz. The sustained performance of our system ranges from 160 to 795 Million Edges Per Second on a DDR3 DRAM. This result approaches the upperbound set by the DRAM bandwidth, and it rivals the best performance from implementations on various multi-core computing platforms.

[1]  Oskar Mencer,et al.  HAGAR: Efficient Multi-context Graph Processors , 2002, FPL.

[2]  Frank Harary,et al.  Graph theory in network analysis , 1983 .

[3]  Fabrizio Petrini,et al.  Challenges in Mapping Graph Exploration Algorithms on Advanced Multi-core Processors , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[4]  Nachiket Kapre,et al.  GraphStep: A System Architecture for Sparse-Graph Algorithms , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[5]  Edmond Chow,et al.  A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[6]  Guy E. Blelloch,et al.  An Experimental Analysis of a Compact Graph Representation , 2004, ALENEX/ANALC.

[7]  Christos Faloutsos,et al.  Fast discovery of connection subgraphs , 2004, KDD.

[8]  Viktor K. Prasanna,et al.  Multi-Core Architecture on FPGA for Large Dictionary String Matching , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[9]  Fabrizio Petrini,et al.  Efficient Breadth-First Search on the Cell/BE Processor , 2008, IEEE Transactions on Parallel and Distributed Systems.

[10]  Yinglong Xia TOPOLOGICALLY ADAPTIVE PARALLEL BREADTH-FIRST SEARCH ON MULTICORE PROCESSORS , 2010 .

[11]  Richard M. Leahy,et al.  An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  David A. Bader,et al.  Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2 , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[13]  Anant Agarwal,et al.  Solving graph problems with dynamic computation structures , 1996, Other Conferences.

[14]  Inge Jonassen,et al.  A graph based algorithm for generating EST consensus sequences , 2005, Bioinform..

[15]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.