Querying of Time Series for Big Data Analytics

Time series data emerge naturally in many fields of applied sciences and engineering including but not limited to statistics, signal processing, mathematical finance, weather and power consumption forecasting. Although time series data have been well studied in the past, they still present a challenge to the scientific community. Advanced operations such as classification, segmentation, prediction, anomaly detection and motif discovery are very useful especially for machine learning as well as other scientific fields. The advent of Big Data in almost every scientific domain motivates us to provide an in-depth study of the state of the art approaches associated with techniques for efficient querying of time series. This chapters aims at providing a comprehensive review of the existing solutions related to time series representation, processing, indexing and querying operations

[1]  Dieter Pfoser,et al.  Revisiting R-Tree Construction Principles , 2002, ADBIS.

[2]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[3]  Ira Assent,et al.  The TS-tree: efficient time series search and retrieval , 2008, EDBT '08.

[4]  A. Aldo Faisal,et al.  Unsupervised Time Series Segmentation for High-Dimensional Body Sensor Network Data Streams , 2014, 2014 11th International Conference on Wearable and Implantable Body Sensor Networks.

[5]  Douglas Stott Parker,et al.  SQL/LPP: A Time Series Extension of SQL Based on Limited Patience Patterns , 1999, DEXA.

[6]  Eric Anderson,et al.  DataSeries: an efficient, flexible data format for structured serial data , 2009, OPSR.

[7]  Miron Livny,et al.  SEQ: A model for sequence databases , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[8]  Viktor K. Prasanna,et al.  Predicting Failures from Oilfield Sensor Data using Time Series Shapelets , 2014 .

[9]  Carlo Zaniolo,et al.  Query Languages and Data Models for Database Sequences and Data Streams , 2004, VLDB.

[10]  Ramakrishna Varadarajan,et al.  The Vertica Analytic Database: C-Store 7 Years Later , 2012, Proc. VLDB Endow..

[11]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[12]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[13]  Tilmann Rabl,et al.  DGFIndex for Smart Grid: Enhancing Hive with a Cost-Effective Multidimensional Range Index , 2014, Proc. VLDB Endow..

[14]  Gareth J. Janacek,et al.  A Bit Level Representation for Time Series Data Mining with Shape Based Similarity , 2006, Data Mining and Knowledge Discovery.

[15]  Patrick Schäfer Experiencing the Shotgun Distance for Time Series Analysis , 2014, Trans. Mach. Learn. Data Min..

[16]  Jennifer Widom,et al.  An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations , 2002 .

[17]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[18]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[19]  Viktor K. Prasanna,et al.  Addressing data veracity in big data applications , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[20]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[21]  Abdullah Mueen,et al.  Time series motif discovery: dimensions and applications , 2014, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[22]  Eamonn J. Keogh,et al.  iSAX: indexing and mining terabyte sized time series , 2008, KDD.

[23]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[24]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[25]  Hans-Peter Kriegel,et al.  Similarity Search on Time Series Based on Threshold Queries , 2006, EDBT.

[26]  Cheong Hee Park,et al.  Query by Humming by Using Scaled Dynamic Time Warping , 2013, 2013 International Conference on Signal-Image Technology & Internet-Based Systems.

[27]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[28]  Young-Jin Kim,et al.  Multi-dimensional range queries in sensor networks , 2003, SenSys '03.

[29]  Bin Jiang,et al.  Online Interval Skyline Queries on Time Series , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[30]  Eamonn J. Keogh,et al.  Iterative Deepening Dynamic Time Warping for Time Series , 2002, SDM.

[31]  Heng Wang,et al.  Locality Statistics for Anomaly Detection in Time Series of Graphs , 2013, IEEE Transactions on Signal Processing.

[32]  Anthony Rowe,et al.  Respawn: A Distributed Multi-resolution Time-Series Datastore , 2013, 2013 IEEE 34th Real-Time Systems Symposium.

[33]  Depei Bao,et al.  A generalized model for financial time series representation and prediction , 2007, Applied Intelligence.

[34]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[35]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[36]  Viktor K. Prasanna,et al.  Efficient customer selection for sustainable demand response in smart grids , 2014, International Green Computing Conference.

[37]  Anne Wilson,et al.  TSDS: high-performance merge, subset, and filter software for time series-like data , 2010, Earth Sci. Informatics.

[38]  Giuseppe Psaila,et al.  Querying Shapes of Histories , 1995, VLDB.

[39]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[40]  Carlo Zaniolo,et al.  Efficient Support for Time Series Queries in Data Stream Management Systems , 2005, Stream Data Management.

[41]  Viktor K. Prasanna,et al.  Extracting discriminative shapelets from heterogeneous sensor data , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[42]  Beng Chin Ooi,et al.  Efficiently Supporting Edit Distance Based String Similarity Search Using B $^+$-Trees , 2014, IEEE Trans. Knowl. Data Eng..

[43]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[44]  Christos Faloutsos,et al.  The TV-tree: An index structure for high-dimensional data , 1994, The VLDB Journal.

[45]  Guanhua Yan,et al.  Sim-Watchdog: Leveraging Temporal Similarity for Anomaly Detection in Dynamic Graphs , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[46]  Victoria L. Rubin,et al.  Veracity Roadmap: Is Big Data Objective, Truthful and Credible? , 2014 .

[47]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[48]  Vincent W. S. Wong,et al.  Autonomous Demand-Side Management Based on Game-Theoretic Energy Consumption Scheduling for the Future Smart Grid , 2010, IEEE Transactions on Smart Grid.

[49]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[50]  W. A. Beyer,et al.  Some Biological Sequence Metrics , 1976 .

[51]  Quanzhong Li,et al.  Skyline index for time series data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[52]  Ramesh C. Jain,et al.  Similarity indexing: algorithms and performance , 1996, Electronic Imaging.

[53]  Raymond T. Ng,et al.  Indexing spatio-temporal trajectories with Chebyshev polynomials , 2004, SIGMOD '04.

[54]  Carlo Zaniolo,et al.  A Sequential Pattern Query Language for Supporting Instant Data Mining for e-Services , 2001, VLDB.

[55]  Clement T. Yu,et al.  Haar Wavelets for Efficient Similarity Search of Time-Series: With and Without Time Warping , 2003, IEEE Trans. Knowl. Data Eng..

[56]  Lior Rokach,et al.  Local-shapelets for fast classification of spectrographic measurements , 2015, Expert Syst. Appl..

[57]  Eamonn J. Keogh,et al.  Beyond one billion time series: indexing and mining very large time series collections with $$i$$SAX2+ , 2013, Knowledge and Information Systems.

[58]  Anatoly A. Zhigljavsky,et al.  Singular Spectrum Analysis for Time Series , 2013, International Encyclopedia of Statistical Science.

[59]  Carlo Zaniolo,et al.  Temporal aggregation in active database rules , 1997, SIGMOD '97.

[60]  P. A. Blight The Analysis of Time Series: An Introduction , 1991 .

[61]  Luca Deri,et al.  tsdb: A Compressed Database for Time Series , 2012, TMA.

[62]  Wesley W. Chu,et al.  An index-based approach for similarity search supporting time warping in large sequence databases , 2001, Proceedings 17th International Conference on Data Engineering.

[63]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[64]  Tatsuji Munaka,et al.  A stream query language TPQL for anomaly detection in facility management , 2012, IDEAS '12.

[65]  Pengcheng Zhang,et al.  Multivariate Time Series Similarity Searching , 2014, TheScientificWorldJournal.

[66]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[67]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[68]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2012, Data Mining and Knowledge Discovery.

[69]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[70]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[71]  Jessica Lin,et al.  SAX-EFG: an evolutionary feature generation framework for time series classification , 2014, GECCO.

[72]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[73]  Cheong Hee Park Query by humming based on multiple spectral hashing and scaled open-end dynamic time warping , 2015, Signal Process..

[74]  Suman Nath,et al.  DataGarage: Warehousing Massive Performance Data on Commodity Servers , 2010, Proc. VLDB Endow..

[75]  Azzam Sleit,et al.  Corner-based splitting: An improved node splitting algorithm for R-tree , 2014, J. Inf. Sci..

[76]  Carey E. Priebe,et al.  Anomaly Detection in Time Series of Graphs using Fusion of Graph Invariants , 2012, IEEE Journal of Selected Topics in Signal Processing.

[77]  Clu-istos Foutsos,et al.  Fast Subsequence Matching in Time-Series Databases , 1994 .

[78]  Angelika Kotz Dittrich,et al.  Research perspectives for time series management systems , 1994, SGMD.

[79]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[80]  Yunhao Liu,et al.  Indexable PLA for Efficient Similarity Search , 2007, VLDB.

[81]  Heiko Koziolek,et al.  Scalability and Robustness of Time-Series Databases for Cloud-Native Monitoring of Industrial Processes , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[82]  S. Srividhya,et al.  Comparative Analysis of R-Tree and R -Tree in Spatial Database , 2014, 2014 International Conference on Intelligent Computing Applications.

[83]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[84]  Eamonn J. Keogh,et al.  Scaling and time warping in time series querying , 2005, The VLDB Journal.

[85]  Brian Christopher Smith,et al.  Query by humming: musical information retrieval in an audio database , 1995, MULTIMEDIA '95.

[86]  Sang-Wook Kim,et al.  Index interpolation: an approach to subsequence matching supporting normalization transform in time-series databases , 2000, CIKM '00.

[87]  Jon Louis Bentley,et al.  Data Structures for Range Searching , 1979, CSUR.

[88]  Liu Xiao-ying Fast Subsequence Matching in Time-series Database , 2008 .

[89]  Eamonn J. Keogh,et al.  A Simple Dimensionality Reduction Technique for Fast Similarity Search in Large Time Series Databases , 2000, PAKDD.

[90]  Henrik André-Jönsson,et al.  Using Signature Files for Querying Time-Series Data , 1997, PKDD.

[91]  Qiang Fu,et al.  YADING: Fast Clustering of Large-Scale Time Series Data , 2015, Proc. VLDB Endow..

[92]  Theodosios Pavlidis,et al.  Waveform Segmentation Through Functional Approximation , 1973, IEEE Transactions on Computers.

[93]  V. Prasanna,et al.  Integrated platform for automated sustainable demand response in smart grids , 2014, 2014 IEEE International Workshop on Intelligent Energy Systems (IWIES).

[94]  Carlos Agón,et al.  Multiobjective Time Series Matching for Audio Classification and Retrieval , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[95]  Beng Chin Ooi,et al.  Distributed data management using MapReduce , 2014, CSUR.

[96]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[97]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[98]  Mohammad Ghodsi,et al.  RAQ: A Range-Queriable Distributed Data Structure , 2005, SOFSEM.

[99]  Carolina Euán,et al.  Detecting Stationary Intervals for Random Waves Using Time Series Clustering , 2014 .

[100]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[101]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[102]  Anthony K. H. Tung,et al.  SpADe: On Shape-based Pattern Detection in Streaming Time Series , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[103]  Philip S. Yu,et al.  Adaptive query processing for time-series data , 1999, KDD '99.

[104]  Yong Duan,et al.  Early classification on multivariate time series , 2015, Neurocomputing.

[105]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.