Towards hybrid online on-demand querying of realtime data with stateful complex event processing

Emerging Big Data applications in areas like ecommerce and energy industry require both online and on-demand queries to be performed over vast and fast data arriving as streams. These present novel challenges to Big Data management systems. Complex Event Processing (CEP) is recognized as a high performance online query scheme which in particular deals with the velocity aspect of the 3-V's of Big Data. However, traditional CEP systems do not consider data variety and lack the capability to embed ad hoc queries over the volume of data streams. In this paper, we propose H2O, a stateful complex event processing framework, to support hybrid online and on-demand queries over realtime data. We propose a semantically enriched event and query model to address data variety. A formal query algebra is developed to precisely capture the stateful and containment semantics of online and on-demand queries. We describe techniques to achieve the interactive query processing over realtime data featured by efficient online querying, dynamic stream data persistence and on-demand access. The system architecture is presented and the current implementation status reported.

[1]  Yogesh L. Simmhan,et al.  Incorporating Semantic Knowledge into Dynamic Data Processing for Smart Power Grids , 2012, International Semantic Web Conference.

[2]  Michael Stonebraker,et al.  Aurora: a data stream management system , 2003, SIGMOD '03.

[3]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[4]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[5]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[6]  Guan Le,et al.  Survey on NoSQL database , 2011, 2011 6th International Conference on Pervasive Computing and Applications.

[7]  Tim Kraska,et al.  Stormy: an elastic and highly available streaming service in the cloud , 2012, EDBT-ICDT '12.

[8]  Douglas B. Terry,et al.  Continuous queries over append-only databases , 1992, SIGMOD '92.

[9]  Rodrigo Fonseca,et al.  Managing parallelism for stream processing in the cloud , 2012, HotCDP '12.

[10]  Alain Biem,et al.  IBM infosphere streams for scalable, real-time, intelligent transportation services , 2010, SIGMOD Conference.

[11]  Sebastian Rudolph,et al.  Stream reasoning and complex event processing in ETALIS , 2012, Semantic Web.

[12]  Asaf Adi,et al.  Complex Event Processing for Financial Services , 2006, 2006 IEEE Services Computing Workshops.

[13]  Himanshu Joshi,et al.  Distributed Database: A Survey , 2013 .

[14]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[15]  Yanlei Diao,et al.  SASE: Complex Event Processing over Streams , 2006, ArXiv.

[16]  Johannes Gehrke,et al.  Cayuga: A General Purpose Event Monitoring System , 2007, CIDR.

[17]  Yogesh L. Simmhan,et al.  Semantic Information Modeling for Emerging Applications in Smart Grid , 2012, 2012 Ninth International Conference on Information Technology - New Generations.

[18]  Carlo Zaniolo,et al.  Query Languages and Data Models for Database Sequences and Data Streams , 2004, VLDB.

[19]  Hector Garcia-Molina,et al.  An Overview of Real-Time Database Systems , 1995, NATO ASI RTC.

[20]  Srinath Perera,et al.  Siddhi: a second look at complex event processing architectures , 2011, GCE '11.