Reactive Resource Provisioning Heuristics for Dynamic Dataflows on Cloud Infrastructure

The need for low latency analysis over high-velocity data streams motivates the need for distributed continuous dataflow systems. Contemporary stream processing systems use simple techniques to scale on elastic cloud resources to handle variable data rates. However, application QoS is also impacted by variability in resource performance exhibited by clouds and hence necessitates autonomic methods of provisioning elastic resources to support such applications on cloud infrastructure. We develop the concept of “dynamic dataflows” which utilize alternate tasks as additional control over the dataflow's cost and QoS. Further, we formalize an optimization problem to represent deployment and runtime resource provisioning that allows us to balance the application's QoS, value, and the resource cost. We propose two greedy heuristics, centralized and sharded, based on the variable-sized bin packing algorithm and compare against a Genetic Algorithm (GA) based heuristic that gives a near-optimal solution. A large-scale simulation study, using the linear road benchmark and VM performance traces from the AWS public cloud, shows that while GA-based heuristic provides a better quality schedule, the greedy heuristics are more practical, and can intelligently utilize cloud elasticity to mitigate the effect of variability, both in input data rates and cloud resource performance, to meet the QoS of fast data applications.

[1]  Ladislau Bölöni,et al.  A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems , 2001, J. Parallel Distributed Comput..

[2]  Alexandru Iosup,et al.  Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing , 2011, IEEE Transactions on Parallel and Distributed Systems.

[3]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[4]  Daniela Zaharie,et al.  Population-Based Metaheuristics for Tasks Scheduling in Heterogeneous Distributed Systems , 2010, NMA.

[5]  Scott Shenker,et al.  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.

[6]  Xiaowei Yang,et al.  CloudCmp: comparing public cloud providers , 2010, IMC '10.

[7]  Omer F. Rana,et al.  End-to-End QoS on Shared Clouds for Highly Dynamic, Large-Scale Sensing Data Streams , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[8]  Xiao Liu,et al.  A market-oriented hierarchical scheduling strategy in cloud workflow systems , 2011, The Journal of Supercomputing.

[9]  Wil M. P. van der Aalst,et al.  Workflow Patterns , 2003, Distributed and Parallel Databases.

[10]  Jie Xu,et al.  Analysis, Modeling and Simulation of Workload Patterns in a Large-Scale Utility Cloud , 2014, IEEE Transactions on Cloud Computing.

[11]  Antti Ylä-Jääski,et al.  Is the Same Instance Type Created Equal? Exploiting Heterogeneity of Public Clouds , 2013, IEEE Transactions on Cloud Computing.

[12]  Claudio Soriente,et al.  StreamCloud: An Elastic and Scalable Data Streaming System , 2012, IEEE Transactions on Parallel and Distributed Systems.

[13]  R. F. Freund,et al.  Dynamic Mapping of a Class of Independent Tasks onto Heterogeneous Computing Systems , 1999, J. Parallel Distributed Comput..

[14]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[15]  Alberto Sillitti,et al.  Failure prediction based on log files using Random Indexing and Support Vector Machines , 2013, J. Syst. Softw..

[16]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[17]  Sungsoo Park,et al.  Algorithms for the variable sized bin packing problem , 2003, Eur. J. Oper. Res..

[18]  Gerhard J. Woeginger,et al.  Exact Algorithms for NP-Hard Problems: A Survey , 2001, Combinatorial Optimization.

[19]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[20]  Ana Paula Appel,et al.  Large-Scale Multi-agent-Based Modeling and Simulation of Microblogging-Based Online Social Network , 2013, MABS.

[21]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[22]  Wil M. P. van der Aalst,et al.  Formal semantics and analysis of control flow in WS-BPEL , 2007, Sci. Comput. Program..

[23]  Joseph M. Hellerstein,et al.  Flux: an adaptive partitioning operator for continuous query systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[24]  SiegelHoward Jay,et al.  Task Matching and Scheduling in Heterogeneous Computing Environments Using a Genetic-Algorithm-Based Approach , 1997 .

[25]  Yogesh L. Simmhan,et al.  Cloud-Based Software Platform for Big Data Analytics in Smart Grids , 2013, Computing in Science & Engineering.

[26]  Naveen Sharma,et al.  Towards autonomic workload provisioning for enterprise Grids and clouds , 2009, 2009 10th IEEE/ACM International Conference on Grid Computing.

[27]  John Shalf,et al.  Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[28]  Rajkumar Buyya,et al.  A taxonomy of scientific workflow systems for grid computing , 2005, SGMD.

[29]  Alain Biem,et al.  IBM infosphere streams for scalable, real-time, intelligent transportation services , 2010, SIGMOD Conference.

[30]  Gero Mühl,et al.  QoS aggregation for Web service composition using workflow patterns , 2004, Proceedings. Eighth IEEE International Enterprise Distributed Object Computing Conference, 2004. EDOC 2004..

[31]  Marin Litoiu,et al.  Resource provisioning for cloud computing , 2009, CASCON.

[32]  Kalyanmoy Deb,et al.  A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.

[33]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[34]  Gary J. Nutt,et al.  The evolution towards flexible workflow systems , 1996, Distributed Syst. Eng..

[35]  Marc Frîncu,et al.  Scheduling highly available applications on cloud environments , 2014, Future Gener. Comput. Syst..

[36]  Wil M. P. van der Aalst,et al.  Workflow Patterns , 2004, Distributed and Parallel Databases.

[37]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[38]  Schahram Dustdar,et al.  Esc: Towards an Elastic Stream Computing Platform for the Cloud , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[39]  Jennifer Widom,et al.  Continuous queries over data streams , 2001, SGMD.

[40]  Vlad Trifa,et al.  Interacting with the SOA-Based Internet of Things: Discovery, Query, Selection, and On-Demand Provisioning of Web Services , 2010, IEEE Transactions on Services Computing.

[41]  Yogesh L. Simmhan,et al.  Exploiting application dynamism and cloud elasticity for continuous dataflows , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[42]  Yogesh L. Simmhan,et al.  Floe: A Continuous Dataflow Framework for Dynamic Cloud Applications , 2014, ArXiv.

[43]  Beng Chin Ooi,et al.  Efficient Dynamic Operator Placement in a Locally Distributed Continuous Query System , 2006, OTM Conferences.

[44]  Alexandru Iosup,et al.  On the Performance Variability of Production Cloud Services , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[45]  Michael Stonebraker,et al.  Linear Road: A Stream Data Management Benchmark , 2004, VLDB.

[46]  Rajkumar Buyya,et al.  CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms , 2011, Softw. Pract. Exp..

[47]  Geoffrey C. Fox,et al.  Granules: A lightweight, streaming runtime for cloud computing with support, for Map-Reduce , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[48]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[49]  Jacques Wainer,et al.  Constraint-Based Flexible Workflows , 2003, CRIWG.

[50]  Marco Dorigo,et al.  Distributed Optimization by Ant Colonies , 1992 .

[51]  Raul Castro Fernandez,et al.  Integrating scale out and fault tolerance in stream processing using operator state management , 2013, SIGMOD '13.