Efficient extraction of high centrality vertices in distributed graphs

Betweenness centrality (BC) is an important measure for identifying high value or critical vertices in graphs, in variety of domains such as communication networks, road networks, and social graphs. However, calculating betweenness values is prohibitively expensive and, more often, domain experts are interested only in the vertices with the highest centrality values. In this paper, we first propose a partition-centric algorithm (MS-BC) to calculate BC for a large distributed graph that optimizes resource utilization and improves overall performance. Further, we extend the notion of approximate BC by pruning the graph and removing a subset of edges and vertices that contribute the least to the betweenness values of other vertices (MSL-BC), which further improves the runtime performance. We evaluate the proposed algorithms using a mix of real-world and synthetic graphs on an HPC cluster and analyze its strengths and weaknesses. The experimental results show an improvement in performance of upto 12× for large sparse graphs as compared to the state-of-the-art, and at the same time highlights the need for better partitioning methods to enable a balanced workload across partitions for unbalanced graphs such as small-world or power-law graphs.

[1]  David A. Bader,et al.  Faster Betweenness Centrality Based on Data Structure Experimentation , 2013, ICCS.

[2]  Kurt Mehlhorn,et al.  A Parallelization of Dijkstra's Shortest Path Algorithm , 1998, MFCS.

[3]  Shirish Tatikonda,et al.  From "Think Like a Vertex" to "Think Like a Graph" , 2013, Proc. VLDB Endow..

[4]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[5]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[6]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[7]  Divyakant Agrawal,et al.  Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data , 2010, SIGMOD 2010.

[8]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[9]  Wen Haw Chong,et al.  Efficient Extraction of High-Betweenness Vertices , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[10]  David A. Bader,et al.  National Laboratory Lawrence Berkeley National Laboratory Title A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets Permalink , 2009 .

[11]  Ulrik Brandes,et al.  Centrality Estimation in Large Networks , 2007, Int. J. Bifurc. Chaos.

[12]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[13]  Toyotaro Suzumura,et al.  X10-based distributed and parallel betweenness centrality and its application to social analytics , 2013, 20th Annual International Conference on High Performance Computing.

[14]  David A. Bader,et al.  Approximating Betweenness Centrality , 2007, WAW.

[15]  Yogesh L. Simmhan,et al.  GoFFish: A Sub-graph Centric Framework for Large-Scale Graph Analytics , 2013, Euro-Par.

[16]  Torsten Hoefler,et al.  A space-efficient parallel algorithm for computing betweenness centrality in distributed memory , 2010, 2010 International Conference on High Performance Computing.

[17]  Adriana Iamnitchi,et al.  Identifying high betweenness centrality nodes in large social networks , 2012, Social Network Analysis and Mining.

[18]  Stephen P. Borgatti,et al.  Centrality and network flow , 2005, Soc. Networks.

[19]  Yogesh L. Simmhan,et al.  Optimizations and Analysis of BSP Graph Processing Models on Public Clouds , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[20]  Petr Konecny Introducing the Cray XMT , 2007 .

[21]  David A. Bader,et al.  Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks , 2006, 2006 International Conference on Parallel Processing (ICPP'06).