Accelerating Large-Scale Single-Source Shortest Path on FPGA

Many real-world problems can be represented as graphs and solved by graph traversal algorithms. Single-Source Shortest Path (SSSP) is a fundamental graph algorithm. Today, large-scale graphs involve millions or even billions of vertices, making efficient parallel graph processing challenging. In this paper, we propose a single-FPGA based design to accelerate SSSP for massive graphs. We adopt the well-known Bellman-Ford algorithm. In the proposed design, graph is stored in external memory, which is more realistic for processing large scale graphs. Using the available external memory bandwidth, our design achieves the maximum data parallelism to concurrently process multiple edges in each clock cycle, regardless of data dependencies. The performance of our design is independent of the graph structure as well. We propose a optimized data layout to enable efficient utilization of external memory bandwidth. We prototype our design using a state-of-the-art FPGA. Experimental results show that our design is capable of processing 1.6 billion edges per second (GTEPS) using a single FPGA, while simultaneously achieving high clock rate of over 200 MHz. This would place us in the 131st position of the Graph 500 benchmark list of supercomputing systems for data intensive applications. Our solution therefore provides comparable performance to state-of-the-art systems.

[1]  Thambipillai Srikanthan,et al.  Field programmable gate array-based acceleration of shortest-path computation , 2011, IET Comput. Digit. Tech..

[2]  Lorenz Huelsbergen,et al.  A representation for dynamic graphs in reconfigurable hardware and its application to fundamental graph algorithms , 2000, FPGA '00.

[3]  Viktor K. Prasanna,et al.  Domain Specific Mapping for Solving Graph Problems on Reconfigurable Devices , 1999, IPPS/SPDP Workshops.

[4]  Joseph T. Kider,et al.  All-pairs shortest-paths for large graphs on the GPU , 2008, GH '08.

[5]  Yu Wang,et al.  A Reconfigurable Computing Approach for Efficient and Scalable Parallel Graph Exploration , 2012, 2012 IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors.

[6]  Brian W. Barrett,et al.  Introducing the Graph 500 , 2010 .

[7]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[8]  Danny Ziyi Chen Developing algorithms and software for geometric path planning problems , 1996, CSUR.

[9]  D. R. Fulkerson,et al.  Flows in Networks. , 1964 .

[10]  Jason D. Bakos High-Performance Heterogeneous Computing with the Convey HC-1 , 2010, Computing in Science & Engineering.

[11]  Jonathan W. Berry,et al.  Challenges in Parallel Graph Processing , 2007, Parallel Process. Lett..

[12]  Wayne Luk,et al.  A framework for FPGA acceleration of large graph problems: Graphlet counting case study , 2011, 2011 International Conference on Field-Programmable Technology.

[13]  Z ChenDanny Developing algorithms and software for geometric path planning problems , 1996 .

[14]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[15]  Yu Wang,et al.  Parallel FPGA-based all pairs shortest paths for sparse networks: A human brain connectome case study , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[16]  Phillip H. Jones,et al.  CyGraph: A Reconfigurable Architecture for Parallel Breadth-First Search , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[17]  Anant Agarwal,et al.  Solving graph problems with dynamic computation structures , 1996, Other Conferences.

[18]  Kunle Olukotun,et al.  Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[19]  Tor M. Aamodt,et al.  A Hybrid Analytical DRAM Performance Model , 2011 .

[20]  Andrew V. Goldberg,et al.  Shortest paths algorithms: Theory and experimental evaluation , 1994, SODA '94.

[21]  Richard Bellman,et al.  ON A ROUTING PROBLEM , 1958 .

[22]  Viktor K. Prasanna,et al.  Dynamically configurable online statistical flow feature extractor on FPGA , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[23]  J. Y. Yen An algorithm for finding shortest routes from all source nodes to a given destination in general networks , 1970 .

[24]  Viktor K. Prasanna,et al.  A message-passing multi-softcore architecture on FPGA for Breadth-first Search , 2010, 2010 International Conference on Field-Programmable Technology.

[25]  P. J. Narayanan,et al.  Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[26]  David A. Bader,et al.  Scalable Graph Exploration on Multicore Processors , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.