A Fast and Efficient Parallel Algorithm for Pruned Landmark Labeling

Hub labeling based shortest distance querying plays a key role in many important networked graph applications, such as route planning, socially-sensitive search and web page ranking. Over the last few years, Pruned Landmark Labeling (PLL) has emerged as the state-of-the-art technique for hub labeling. PLL drastically reduces the complexity of label construction by pruning Shortest-Path Trees (SPTs). However, PLL is inherently sequential, as different SPTs must be constructed in a specific order of source vertices to ensure small label size. Particularly, for large graphs, it takes significant processing time to construct even pruned SPTs from all vertices in the graph. While there are many works on parallelizing single source shortest path, these solutions cannot be directly used for PLL, as pruning and label querying introduce significant additional complexity while restricting parallelism within an SPT. In this paper, we propose a novel, fast and efficient algorithm to significantly accelerate PLL on large graphs based on a two-level parallelization of SPTs: intra- and inter-tree. For intra-tree, we generate pruned SPTs based on a modification of the Bellman-Ford (BF) algorithm. We further optimize BF to reduce SPT label querying and initialization costs. We implement our algorithm using the recently proposed Graph Processing Over Partitions (GPOP) which dramatically improves cache-efficiency and DRAM communication-bandwidth. When pruned SPTs become very small and parallelizing individual SPTs is not advantageous, we switch to inter-tree parallelization and construct multiple trees concurrently in a batch. Experiments conducted on a 36 core (2-way hyperthreaded) Intel Broadwell server show that on some datasets, our proposed parallel algorithm can achieve greater than 35.1× speedup over state-of-the-art sequential algorithm.

[1]  Andrew V. Goldberg,et al.  Computing the shortest path: A search meets graph theory , 2005, SODA '05.

[2]  Filippo Menczer,et al.  Finding Streams in Knowledge Graphs to Support Fact Checking , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[3]  Takuya Akiba,et al.  Fast exact shortest-path distance queries on large networks by pruned landmark labeling , 2013, SIGMOD '13.

[4]  Peter Sanders,et al.  [Delta]-stepping: a parallelizable shortest path algorithm , 2003, J. Algorithms.

[5]  Takuya Akiba,et al.  Fast Shortest-path Distance Queries on Road Networks by Pruned Highway Labeling , 2014, ALENEX.

[6]  Raymond Chi-Wing Wong,et al.  Hop Doubling Label Indexing for Point-to-Point Distance Querying on Scale-Free Networks , 2014, Proc. VLDB Endow..

[7]  Muhammad Aamir Cheema,et al.  Efficient Landmark-Based Candidate Generation for kNN Queries on Road Networks , 2017, DASFAA.

[8]  L. R. Ford,et al.  NETWORK FLOW THEORY , 1956 .

[9]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[10]  Viktor K. Prasanna,et al.  Accelerating PageRank using Partition-Centric Processing , 2017, USENIX Annual Technical Conference.

[11]  Haim Kaplan,et al.  Reach for A*: Efficient Point-to-Point Shortest Path Algorithms , 2006, ALENEX.

[12]  Man Lung Yiu,et al.  An Experimental Study on Hub Labeling based Shortest Path Algorithms , 2017, Proc. VLDB Endow..

[13]  Richard Bellman,et al.  ON A ROUTING PROBLEM , 1958 .

[14]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[15]  Andrew V. Goldberg,et al.  Robust Distance Queries on Massive Networks , 2014, ESA.

[16]  Peter Sanders,et al.  Better Approximation of Betweenness Centrality , 2008, ALENEX.

[17]  Stefan M. Wild,et al.  Maximizing influence in a competitive social network: a follower's perspective , 2007, ICEC.

[18]  Andrew V. Goldberg,et al.  Hierarchical Hub Labelings for Shortest Paths , 2012, ESA.

[19]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[20]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[21]  Robert E. Tarjan,et al.  Fibonacci heaps and their uses in improved network optimization algorithms , 1984, JACM.

[22]  Rajgopal Kannan,et al.  GPOP: a cache and memory-efficient framework for graph processing over partitions , 2018, PPoPP.

[23]  Peter Sanders,et al.  Fast Routing in Road Networks with Transit Nodes , 2007, Science.