Quickly finding a truss in a haystack

The k-truss of a graph is a subgraph such that each edge is tightly connected to the remaining elements in the k-truss. The k-truss of a graph can also represent an important community in the graph. Finding the k-truss of a graph can be done in a polynomial amount of time, in contrast finding other subgraphs such as cliques. While there are numerous formulations and algorithms for finding the maximal k-truss of a graph, many of these tend to be computationally expensive and do not scale well. Many algorithms are iterative and use static graph triangle counting in each iteration of the graph. In this work we present a novel algorithm for finding both the k-truss of the graph (for a given k), as well as the maximal k-truss using a dynamic graph formulation. Our algorithm has two main benefits. 1) Unlike many algorithms that rerun the static graph triangle counting after the removal of non-conforming edges, we use a new dynamic graph formulation that only requires updating the edges affected by the removal. As our updates are local, we only do a fraction of the work compared to the other algorithms. 2) Our algorithm is extremely scalable and is able to concurrently detect deleted triangles in contrast to past sequential approaches. While our algorithm is architecture independent, we show a CUDA based implementation for NVIDIA GPUs. In numerous instances, our new algorithm is anywhere from 100X-10000X faster than the Graph Challenge benchmark. Furthermore, our algorithm shows significant speedups, in some cases over 70X, over a recently developed sequential and highly optimized algorithm.

[1]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[2]  Humayun Kabir,et al.  Shared-Memory Graph Truss Decomposition , 2017, 2017 IEEE 24th International Conference on High Performance Computing (HiPC).

[3]  William Song,et al.  Static graph challenge: Subgraph isomorphism , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[4]  Adam Polak,et al.  Counting Triangles in Large Graphs on GPU , 2015, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[5]  Ken A. Hawick,et al.  GPGPU and Multi-Core Architectures for Computing Clustering Coefficients of Irregular Graphs , 2012 .

[6]  Jeremy Kepner,et al.  Graphulo: Linear Algebra Graph Kernels for NoSQL Databases , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[7]  Dorothea Wagner,et al.  Finding, Counting and Listing All Triangles in Large Graphs, an Experimental Study , 2005, WEA.

[8]  Julian Shun,et al.  Multicore triangle computations without tuning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[9]  Rasmus Pagh,et al.  Triangle Counting in Dynamic Graph Streams , 2014, Algorithmica.

[10]  Lluís-Miquel Munguía,et al.  Fast triangle counting on the GPU , 2014, IA3 '14.

[11]  Humayun Kabir,et al.  Parallel k-truss decomposition on multicore systems , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[12]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[13]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[14]  Jeremy Kepner,et al.  From NoSQL Accumulo to NewSQL Graphulo: Design and utility of graph algorithms inside a BigTable database , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[15]  David A. Bader,et al.  GPU merge path: a GPU merging algorithm , 2012, ICS '12.

[16]  Jeffrey Xu Yu,et al.  Querying k-truss community in large and dynamic graphs , 2014, SIGMOD Conference.

[17]  David A. Bader,et al.  cuSTINGER: Supporting dynamic graph algorithms for GPUs , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[18]  Wenfei Fan,et al.  Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data , 2014 .

[19]  David A. Bader,et al.  Faster Clustering Coefficient Using Vertex Covers , 2013, 2013 International Conference on Social Computing.

[20]  David A. Bader,et al.  STINGER: High performance data structure for streaming graphs , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[21]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[22]  Ümit V. Çatalyürek,et al.  Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions , 2014, WWW.

[23]  Yitzhak Birk,et al.  Merge Path - Parallel Merging Made Simple , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[24]  Jia Wang,et al.  Truss Decomposition in Massive Networks , 2012, Proc. VLDB Endow..

[25]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[26]  David A. Bader,et al.  Load balanced clustering coefficients , 2014, PPAA '14.

[27]  David A. Bader,et al.  Massive streaming data analytics: A case study with clustering coefficients , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[28]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[29]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[30]  John D. Owens,et al.  A Comparative Study on Exact Triangle Counting Algorithms on the GPU , 2016, HPGP@HPDC.

[31]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[32]  Christian Sohler,et al.  Counting triangles in data streams , 2006, PODS.

[33]  Karsten Schwan,et al.  GraphIn: An Online High Performance Incremental Graph Processing Framework , 2016, Euro-Par.