Exploiting application dynamism and cloud elasticity for continuous dataflows

Contemporary continuous data flow systems use elastic scaling on distributed cloud resources to handle variable data rates and to meet applications' needs while attempting to maximize resource utilization. However, virtualized clouds present an added challenge due to the variability in resource performance - over time and space - thereby impacting the application's QoS. Elastic use of cloud resources and their allocation to continuous dataflow tasks need to adapt to such infrastructure dynamism. In this paper, we develop the concept of “dynamic dataflows” as an extension to continuous dataflows that utilizes alternate tasks and allows additional control over the dataflow's cost and QoS. We formalize an optimization problem to perform both deployment and runtime cloud resource management for such dataflows, and define an objective function that allows trade-off between the application's value against resource cost. We present two novel heuristics, local and global, based on the variable sized bin packing heuristics to solve this NP-hard problem. We evaluate the heuristics against a static allocation policy for a dataflow with different data rate profiles that is simulated using VM performance traces from a private cloud data center. The results show that the heuristics are effective in intelligently utilizing cloud elasticity to mitigate the effect of both input data rate and cloud resource performance variabilities on QoS.

[1]  Rajkumar Buyya,et al.  CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms , 2011, Softw. Pract. Exp..

[2]  Wil M. P. van der Aalst,et al.  Workflow Patterns , 2003, Distributed and Parallel Databases.

[3]  S. Carlsen Action Port Model: a mixed paradigm conceptual workflow modeling language , 1998, Proceedings. 3rd IFCIS International Conference on Cooperative Information Systems (Cat. No.98EX122).

[4]  Jano I. van Hemert,et al.  Scientific Workflow: A Survey and Research Directions , 2007, PPAM.

[5]  Sungsoo Park,et al.  Algorithms for the variable sized bin packing problem , 2003, Eur. J. Oper. Res..

[6]  Jacques Wainer,et al.  Constraint-Based Flexible Workflows , 2003, CRIWG.

[7]  Guochuan Zhang,et al.  On Variable-Sized Bin Packing , 2007 .

[8]  Alexandru Iosup,et al.  Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing , 2011, IEEE Transactions on Parallel and Distributed Systems.

[9]  Scott Shenker,et al.  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.

[10]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[11]  Omer F. Rana,et al.  End-to-End QoS on Shared Clouds for Highly Dynamic, Large-Scale Sensing Data Streams , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[12]  Radu Prodan,et al.  Dynamic scheduling of scientific workflow applications on the grid: a case study , 2005, SAC '05.

[13]  Tao Yu,et al.  Efficient algorithms for Web services selection with end-to-end QoS constraints , 2007, TWEB.

[14]  Gero Mühl,et al.  QoS aggregation for Web service composition using workflow patterns , 2004, Proceedings. Eighth IEEE International Enterprise Distributed Object Computing Conference, 2004. EDOC 2004..

[15]  Rajkumar Buyya,et al.  Cost-based scheduling of scientific workflow applications on utility grids , 2005, First International Conference on e-Science and Grid Computing (e-Science'05).

[16]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[17]  Yogesh L. Simmhan,et al.  The Trident Scientific Workflow Workbench , 2008, 2008 IEEE Fourth International Conference on eScience.

[18]  Jennifer Widom,et al.  Continuous queries over data streams , 2001, SGMD.

[19]  Vlad Trifa,et al.  Interacting with the SOA-Based Internet of Things: Discovery, Query, Selection, and On-Demand Provisioning of Web Services , 2010, IEEE Transactions on Services Computing.

[20]  Toyotaro Suzumura,et al.  Elastic Stream Computing with Clouds , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[21]  Beng Chin Ooi,et al.  Efficient Dynamic Operator Placement in a Locally Distributed Continuous Query System , 2006, OTM Conferences.

[22]  Alexandru Iosup,et al.  On the Performance Variability of Production Cloud Services , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[23]  Joseph M. Hellerstein,et al.  Flux: an adaptive partitioning operator for continuous query systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[24]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[25]  Alexandros Labrinidis,et al.  CONFLuEnCE: Implementation and application design , 2011, 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[26]  Mehdi Serairi,et al.  Heuristics for the variable sized bin-packing problem , 2009, Comput. Oper. Res..

[27]  SiegelHoward Jay,et al.  Task Matching and Scheduling in Heterogeneous Computing Environments Using a Genetic-Algorithm-Based Approach , 1997 .

[28]  Schahram Dustdar,et al.  WS-Aggregation: distributed aggregation of web services data , 2011, SAC.

[29]  Schahram Dustdar,et al.  Esc: Towards an Elastic Stream Computing Platform for the Cloud , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[30]  John Shalf,et al.  Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[31]  Rajkumar Buyya,et al.  A taxonomy of scientific workflow systems for grid computing , 2005, SGMD.

[32]  Maria E. Orlowska,et al.  Specification and validation of process constraints for flexible workflows , 2005, Inf. Syst..

[33]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[34]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[35]  Gary J. Nutt,et al.  The evolution towards flexible workflow systems , 1996, Distributed Syst. Eng..

[36]  Geoffrey C. Fox,et al.  Granules: A lightweight, streaming runtime for cloud computing with support, for Map-Reduce , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[37]  G. Bruce Berriman,et al.  Scientific workflow applications on Amazon EC2 , 2010, 2009 5th IEEE International Conference on E-Science Workshops.

[38]  Rajkumar Buyya,et al.  Scheduling scientific workflow applications with deadline and budget constraints using genetic algorithms , 2006, Sci. Program..

[39]  E. Michael Maximilien,et al.  A framework and ontology for dynamic Web services selection , 2004, IEEE Internet Computing.

[40]  Wil M. P. van der Aalst,et al.  Workflow Patterns , 2004, Distributed and Parallel Databases.

[41]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[42]  Yoonho Park,et al.  SPC: a distributed, scalable platform for data mining , 2006, DMSSP '06.

[43]  Claudio Soriente,et al.  StreamCloud: An Elastic and Scalable Data Streaming System , 2012, IEEE Transactions on Parallel and Distributed Systems.

[44]  Chao Tian,et al.  Nova: continuous Pig/Hadoop workflows , 2011, SIGMOD '11.

[45]  Alain Biem,et al.  IBM infosphere streams for scalable, real-time, intelligent transportation services , 2010, SIGMOD Conference.

[46]  R. F. Freund,et al.  Dynamic Mapping of a Class of Independent Tasks onto Heterogeneous Computing Systems , 1999, J. Parallel Distributed Comput..

[47]  Gerhard J. Woeginger,et al.  Exact Algorithms for NP-Hard Problems: A Survey , 2001, Combinatorial Optimization.