GCN Inference Acceleration using High-Level Synthesis

GCN (Graph Convolutional Network) has become a promising solution for many applications, such as recommendation systems, social data mining, etc. Many of these applications requires low latency GCN inference.In this paper, we provide a case study of a GCN inference acceleration on FPGA. We explore high-level synthesis programming model to achieve low-latency inference. First, we propose a partition-centric mapping strategy to map the execution tasks of GCN onto FPGA to exploit data reuse, which reduces external memory access overhead. Second, we provide HLS-based kernel design with improved memory performance and achieve massive data parallelism. Third, we perform design space exploration to facilitate feasible pre-placement which avoids potential Place-and-Route (PnR) failures. We evaluate our design on a state-of-the-art FPGA platform using three commonly used datasets: Reddit, Yelp and Amazon-2M. We compare our design with two state-of-the-art libraries PyTorch-Geometric (PyG) and Deep Graph Library (DGL) running on high-end CPU and GPU by evaluating their latency and energy efficiency to perform full-batch GCN inference on a two-layer Vanilla-GCN model. Compared with PyG CPU version, our design reduces the latency by 59.95× and is 96.22× more energy efficient on the average. Compared with DGL, our design achieves 2.9 × –6.4× speedup and is 5.87× more energy efficient compared with the CPU version. Compared with the DGL GPU version, although the latency of our design is 1.67 × –2.5× that of DGL GPU, our design is 1.8× more energy efficient.