Accelerating Graph Analytics on CPU-FPGA Heterogeneous Platform

Hardware accelerators for graph analytics have gained increasing interest. Vertex-centric and edge-centric paradigms are widely used to design graph analytics accelerators. However, both of them have notable drawbacks: vertex-centric paradigm requires random memory accesses to traverse edges and edge-centric paradigm results in redundant edge traversals. In this paper, we explore the tradeoffs between vertex-centric and edge-centric paradigms and propose a hybrid algorithm which dynamically selects between them during the execution. We introduce the notion of active vertex ratio, based on which we develop a simple but efficient paradigm selection approach. We develop a hybrid data structure to concurrently support vertex-centric and edge-centric paradigms. Based on the hybrid data structure, we propose a graph partitioning scheme to increase parallelism and enable efficient parallel computation on heterogeneous platforms. In each iteration, we use our paradigm selection approach to select the appropriate paradigm for each partition. Further, we map our hybrid algorithm onto a stateof- the-art heterogeneous platform which integrates a multi-core CPU and a Field-Programmable Gate Array (FPGA) in a cache coherent fashion. We use our design methodology to accelerate two fundamental graph algorithms, breadth-first search (BFS) and single-source shortest path (SSSP). Experimental results show that our CPU-FPGA co-processing achieves up to 1.5× (1.9×) speedup for BFS (SSSP) compared with optimized baseline designs. Compared with the state-of-the-art FPGA-based designs, our design achieves up to 4.0× (4.2×) throughput improvement for BFS (SSSP). Compared with a state-of-the-art multi-core design, our design demonstrates up to 1.5× (1.8×) speedup for BFS (SSSP).

[1]  Phillip H. Jones,et al.  CyGraph: A Reconfigurable Architecture for Parallel Breadth-First Search , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[2]  Magnus Jahre,et al.  Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[3]  Heiner Giefers,et al.  Energy-efficient stochastic matrix function estimator for graph analytics on FPGA , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[4]  Yao Wang,et al.  Aggressive pipelining of irregular applications on reconfigurable hardware , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[5]  Viktor K. Prasanna,et al.  A flexible and scalable high-performance OpenFlow switch on heterogeneous SoC platforms , 2014, 2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC).

[6]  Yong Dou,et al.  An FPGA Implementation for Solving the Large Single-Source-Shortest-Path Problem , 2016, IEEE Transactions on Circuits and Systems II: Express Briefs.

[7]  Viktor K. Prasanna,et al.  Optimizing memory performance for FPGA implementation of pagerank , 2015, 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[8]  Viktor K. Prasanna,et al.  OSCAR: Optimizing SCrAtchpad reuse for graph processing , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[9]  John D. Owens,et al.  Gunrock: a high-performance graph processing library on the GPU , 2015, PPoPP.

[10]  Kunle Olukotun,et al.  GraphOps: A Dataflow Library for Graph Analytics Acceleration , 2016, FPGA.

[11]  Kunle Olukotun,et al.  Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[12]  Yu Wang,et al.  ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture , 2017, FPGA.

[13]  Rafael Asenjo,et al.  Breadth-First Search on Heterogeneous Platforms: A Case of Study on Social Networks , 2016, 2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[14]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[15]  Viktor K. Prasanna,et al.  Accelerating Large-Scale Single-Source Shortest Path on FPGA , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[16]  Yu Wang,et al.  A Reconfigurable Computing Approach for Efficient and Scalable Parallel Graph Exploration , 2012, 2012 IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors.

[17]  Pradeep Dubey,et al.  GraphMat: High performance graph analytics made productive , 2015, Proc. VLDB Endow..

[18]  Nachiket Kapre,et al.  Spatial hardware implementation for sparse graph algorithms in GraphStep , 2011, TAAS.

[19]  Margaret Martonosi,et al.  Graphicionado: A high-performance and energy-efficient accelerator for graph analytics , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  James C. Hoe,et al.  GraphGen: An FPGA Framework for Vertex-Centric Graph Computation , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.

[21]  Viktor K. Prasanna,et al.  High-Throughput and Energy-Efficient Graph Processing on FPGA , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[22]  Poland Alaa Khamis Advanced Multimedia and Ubiquitous Engineering , 2016, Lecture Notes in Electrical Engineering.

[23]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[24]  Jing Li,et al.  Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search , 2017, FPGA.