GoFFish: A Sub-graph Centric Framework for Large-Scale Graph Analytics

Vertex centric models for large scale graph processing are gaining traction due to their simple distributed programming abstraction. However, pure vertex centric algorithms under-perform due to large communication overheads and slow iterative convergence. We introduce GoFFish a scalable sub-graph centric framework co-designed with a distributed persistent graph storage for large scale graph analytics on commodity clusters, offering the added natural flexibility of shared memory sub-graph computation. We map Connected Components, SSSP and PageRank algorithms to this model and empirically analyze them for several real world graphs, demonstrating orders of magnitude improvements, in some cases, compared to Apache Giraph’s vertex centric framework.

[1]  Yogesh L. Simmhan,et al.  Optimizations and Analysis of BSP Graph Processing Models on Public Clouds , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[2]  Kamesh Madduri,et al.  Parallel breadth-first search on distributed memory systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[3]  Nancy M. Amato,et al.  The STAPL Parallel Graph Library , 2012, LCPC.

[4]  Jin-Soo Kim,et al.  HAMA: An Efficient Matrix Computation with the MapReduce Framework , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[5]  Jimeng Sun,et al.  DisCo: Distributed Co-clustering with Map-Reduce: A Case Study towards Petabyte-Scale End-to-End Mining , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[6]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[7]  Douglas P. Gregor,et al.  The Parallel BGL : A Generic Library for Distributed Graph Computations , 2005 .

[8]  Leslie G. Valiant,et al.  Direct Bulk-Synchronous Parallel Algorithms , 1994, J. Parallel Distributed Comput..

[9]  Bingsheng He,et al.  Large graph processing in the cloud , 2010, SIGMOD Conference.

[10]  Jonathan W. Berry,et al.  Challenges in Parallel Graph Processing , 2007, Parallel Process. Lett..

[11]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[12]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[13]  Vipin Kumar,et al.  Analysis of Multilevel Graph Partitioning , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[14]  David A. Bader,et al.  Investigating Graph Algorithms in the BSP Model on the Cray XMT , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[15]  D. K. Arvind,et al.  Languages and Compilers for Parallel Computing , 2014, Lecture Notes in Computer Science.

[16]  P. J. Narayanan,et al.  Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[17]  Shirish Tatikonda,et al.  From "Think Like a Vertex" to "Think Like a Graph" , 2013, Proc. VLDB Endow..

[18]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[19]  Haixun Wang,et al.  Trinity: a distributed graph engine on a memory cloud , 2013, SIGMOD '13.

[20]  Jimmy J. Lin,et al.  Design patterns for efficient graph algorithms in MapReduce , 2010, MLG '10.