Quality of Similarity Rankings in Time Series

Time series data objects can be interpreted as highdimensional vectors, which allows the application of many traditional distance measures aswell as more specialized measures. However, many distance functions are known to suffer from poor contrast in high-dimensional settings, putting their usefulness as similarity measures into question. On the other hand, shared-nearest-neighbor distances based on the ranking of data objects induced by some primary distance measure have been known to lead to improved performance in high-dimensional settings. In this paper, we study the performance of shared-neighbor similarity measures in the context of similarity search for time series data objects. Our findings are that the use of shared-neighbor similarity measures generally results in more stable performances than that of their associated primary distance measures.

[1]  M. E. Houle The Relevant‐Set Correlation Model for Data Clustering , 2008, Stat. Anal. Data Min..

[2]  Elke Achtert,et al.  ELKI in Time: ELKI 0.2 for the Performance Evaluation of Distance Measures for Time Series , 2009, SSTD.

[3]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[4]  Michel Verleysen,et al.  The Curse of Dimensionality in Data Mining and Time Series Prediction , 2005, IWANN.

[5]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[6]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[7]  Malcolm P. Atkinson,et al.  Issues Raised by Three Years of Developing PJama: An Orthogonally Persistent Platform for Java , 1999, ICDT.

[8]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[9]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[10]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[11]  Raymond T. Ng,et al.  Indexing spatio-temporal trajectories with Chebyshev polynomials , 2004, SIGMOD '04.

[12]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[13]  R. Coifman,et al.  Local feature extraction and its applications using a library of bases , 1994 .

[14]  Michael E. Houle,et al.  Navigating massive data sets via local clustering , 2003, KDD '03.

[15]  Lei Chen,et al.  On the Marriage of Edit Distance and Lp Norms , 2004, VLDB 2004.

[16]  Hans-Peter Kriegel,et al.  Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data , 2009, PAKDD.

[17]  Chengyang Zhang,et al.  Advances in Spatial and Temporal Databases , 2015, Lecture Notes in Computer Science.

[18]  Marianne Winslett,et al.  Scientific and Statistical Database Management, 21st International Conference, SSDBM 2009, New Orleans, LA, USA, June 2-4, 2009, Proceedings , 2009, SSDBM.

[19]  David B. Lomet,et al.  Foundations of Data Organization and Algorithms , 1993, Lecture Notes in Computer Science.

[20]  Kristin P. Bennett,et al.  Density-based indexing for approximate nearest-neighbor queries , 1999, KDD '99.

[21]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[22]  Ray A. Jarvis,et al.  Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[23]  Alberto Prieto,et al.  Computational intelligence and bioinspired systems , 2007, Neurocomputing.

[24]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[25]  N. Ahmed,et al.  Discrete Cosine Transform , 1996 .

[26]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[27]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[28]  Duc Truong Pham,et al.  Control chart pattern recognition using a new type of self-organizing neural network , 1998 .

[29]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[30]  Alexandros Nanopoulos,et al.  Time-Series Classification in Many Intrinsic Dimensions , 2010, SDM.

[31]  Alexandros Nanopoulos,et al.  Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..

[32]  Hans-Peter Kriegel,et al.  Can Shared-Neighbor Distances Defeat the Curse of Dimensionality? , 2010, SSDBM.

[33]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.