Modeling performances of concurrent big data applications

Big Data applications are characterized by a non‐negligible number of complex parallel transactions on a huge amount of data that continuously varies, generally increasing over time. Because of the amount of needed resources, the ideal runtime scenario for these applications is based on complex cloud computing and storage infrastructures, providing a scalable degree of parallelism together with isolation between different applications and resource abstraction. However, such additional abstraction degree also introduces significant complexity in performance modeling and decision making. Potential concurrency of many applications on the same cloud infrastructure has to be evaluated, and, simultaneously, scalability of applications over time has to be studied through proper modeling practices, in order to predict the system behavior as the usage patterns evolve and the load increases. For this purpose, in this paper, we propose an analytic modeling technique based on the use of Markovian Agents and Mean Field Analysis that allows the effective description of different concurrent Big Data applications on a same, multi‐site cloud infrastructure, accounting for mutual interactions, in order to support the careful evaluation of several elements in terms of real costs/risks/benefits for correctly dimensioning and allocating the resources and verifying the existing service level agreements. Copyright © 2014 John Wiley & Sons, Ltd.

[1]  Francesco Palmieri,et al.  Towards a federated Metropolitan Area Grid environment: The SCoPE network-aware infrastructure , 2010, Future Gener. Comput. Syst..

[2]  Guanying Wang,et al.  A simulation approach to evaluating design decisions in MapReduce setups , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.

[3]  Liang Dong,et al.  Starfish: A Self-tuning System for Big Data Analytics , 2011, CIDR.

[4]  Mauro Iacono,et al.  A Performance Modeling Language For Big Data Architectures , 2013, ECMS.

[5]  Mauro Iacono,et al.  Performance evaluation of NoSQL big-data applications using multi-formalism models , 2014, Future Gener. Comput. Syst..

[6]  Herodotos Herodotou,et al.  Profiling, what-if analysis, and cost-based optimization of MapReduce programs , 2011, Proc. VLDB Endow..

[7]  Maozhen Li,et al.  MRSim: A discrete event based MapReduce simulator , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[8]  Francine Berman,et al.  Performance prediction in production environments , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[9]  Marco Gribaudo,et al.  Analysis of On-off policies in Sensor Networks Using Interacting Markovian Agents , 2008, 2008 Sixth Annual IEEE International Conference on Pervasive Computing and Communications (PerCom).

[10]  Christian Esposito,et al.  Interconnecting Federated Clouds by Using Publish-Subscribe Service , 2013, Cluster Computing.

[11]  Fabrizio Petrini,et al.  Predictive Performance and Scalability Modeling of a Large-Scale Application , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[12]  Michael Georgiopoulos,et al.  A Grid Based System for Data Mining Using MapReduce , 2007 .

[13]  M. Benaïm,et al.  A class of mean field interaction models for computer and communication systems , 2008, 2008 6th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks and Workshops.

[14]  Lei Yu,et al.  SimMapReduce: A Simulator for Modeling MapReduce Framework , 2011, 2011 Fifth FTRA International Conference on Multimedia and Ubiquitous Engineering.

[15]  Shivnath Babu,et al.  Tuning Database Configuration Parameters with iTuned , 2009, Proc. VLDB Endow..

[16]  T. Kurtz Strong approximation theorems for density dependent Markov chains , 1978 .

[17]  Miklós Telek,et al.  Analysis of Large Scale Interacting Systems by Mean Field Method , 2008, 2008 Fifth International Conference on Quantitative Evaluation of Systems.

[18]  Herodotos Herodotou Hadoop Performance Models , 2011, ArXiv.

[19]  Mauro Iacono,et al.  Exploiting mean field analysis to model performances of big data architectures , 2014, Future Gener. Comput. Syst..

[20]  Brian Armstrong,et al.  Performance Forecasting : Characterization of Applications on Current and Future Architectures , 1997 .

[21]  Andy Konwinski,et al.  Chukwa: A large-scale monitoring system , 2008 .

[22]  Gregory R. Ganger,et al.  Applying Performance Models to Understand Data-Intensive Computing Efficiency , 2010 .

[23]  Brian Armstrong,et al.  Performance forecasting: towards a methodology for characterizing large computational applications , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[24]  Jose Renato Santos,et al.  JustRunIt: Experiment-Based Management of Virtualized Data Centers , 2009, USENIX Annual Technical Conference.

[25]  Albert Y. Zomaya,et al.  Energy-aware parallel task scheduling in a cluster , 2013, Future Gener. Comput. Syst..

[26]  Peter A. Dinda,et al.  An evaluation of linear models for host load prediction , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[27]  Marco Gribaudo Analysis of Large Populations of Interacting Objects with Mean Field and Markovian Agents , 2009, EPEW.

[28]  Michael I. Jordan,et al.  Statistical Machine Learning Makes Automatic Control Practical for Internet Datacenters , 2009, HotCloud.

[29]  Randy H. Katz,et al.  X-Trace: A Pervasive Network Tracing Framework , 2007, NSDI.

[30]  Mauro Iacono,et al.  Modeling apache hive based applications in big data architectures , 2013, VALUETOOLS.

[31]  William H. Press,et al.  Numerical recipes in C (2nd ed.): the art of scientific computing , 1992 .

[32]  Maozhen Li,et al.  HSim: A MapReduce simulator in enabling Cloud Computing , 2013, Future Gener. Comput. Syst..