Performance Analysis of Vertex-centric Graph Algorithms on the Azure Cloud Platform

Finding key vertices in large graphs is an important problem in many applications such as social networks, bioinformatics, and distribution networks. Betweenness centrality is a popular algorithm for finding such vertices and has been studied extensively, yielding several parallel formulations suitable to supercomputers and clusters. In this paper we implement and study betweenness centrality in the context of cloud-based platforms using Microsoft Windows Azure as our case study. We demonstrate scalable parallel performance and investigate key issues related to a cloud-based implementation including mitigating penalties associated with VM failures as well as the impact of communication overheads in the cloud. We use a combination of empirical and analytical evaluation using both synthetic small-world and real-world social interaction graphs. KeywordsGraph; Cloud computing; Azure; performance analysis; betweennness centrality; scalability

[1]  David A. Bader,et al.  National Laboratory Lawrence Berkeley National Laboratory Title A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets Permalink , 2009 .

[2]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[3]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[4]  GetoorLise,et al.  Eighth workshop on mining and learning with graphs , 2011 .

[5]  Vineet Bafna,et al.  HapCUT: an efficient and accurate algorithm for the haplotype assembly problem , 2008, ECCB.

[6]  Jimmy J. Lin,et al.  Design patterns for efficient graph algorithms in MapReduce , 2010, MLG '10.

[7]  Samuel Kounev,et al.  Evaluating and Modeling Virtualization Performance Overhead for Cloud Environments , 2011, CLOSER.

[8]  David A. Bader,et al.  SNAP, Small-world Network Analysis and Partitioning: An open-source parallel graph framework for the exploration of large-scale networks , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[9]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[10]  Nitesh V. Chawla,et al.  DisNet: A Framework for Distributed Graph Computation , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[11]  Sherry Marcus,et al.  Graph-based technologies for intelligence analysis , 2004, CACM.

[12]  David A. Bader,et al.  GTgraph : A Synthetic Graph Generator Suite , 2006 .

[13]  David Gelernter,et al.  Generative communication in Linda , 1985, TOPL.

[14]  R. Guimerà,et al.  The worldwide air transportation network: Anomalous centrality, community structure, and cities' global roles , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  L. Amaral,et al.  The web of human sexual contacts , 2001, Nature.

[17]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[18]  Kashi Venkatesh Vishwanath,et al.  Characterizing cloud computing hardware reliability , 2010, SoCC '10.

[19]  Ninghui Sun,et al.  A Parallel Algorithm for Computing Betweenness Centrality , 2009, 2009 International Conference on Parallel Processing.