Change-Point Detection in a Sequence of Bags-of-Data

In this paper, the limitation that is prominent in most existing works of change-point detection methods is addressed by proposing a nonparametric, computationally efficient method. The limitation is that most works assume that each data point observed at each time step is a single multi-dimensional vector. However, there are many situations where this does not hold. Therefore, a setting where each observation is a collection of random variables, which we call a bag of data, is considered. After estimating the underlying distribution behind each bag of data and embedding those distributions in a metric space, the change-point score is derived by evaluating how the sequence of distributions is fluctuating in the metric space using a distance-based information estimator. Also, a procedure that adaptively determines when to raise alerts is incorporated by calculating the confidence interval of the change-point score at each time step. This avoids raising false alarms in highly noisy situations and enables detecting changes of various magnitudes. A number of experimental studies and numerical examples are provided to demonstrate the generality and the effectiveness of our approach with both synthetic and real datasets.

[1]  Masashi Sugiyama,et al.  Change-point detection in time-series data by relative density-ratio estimation , 2012 .

[2]  Simon J. Godsill,et al.  Detection of abrupt spectral changes using support vector machines an application to audio signal segmentation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Jung-Min Park,et al.  An overview of anomaly detection techniques: Existing solutions and latest technological trends , 2007, Comput. Networks.

[4]  Kensuke Koshijima,et al.  DISTANCE-BASED CHANGE-POINT DETECTION WITH ENTROPY ESTIMATION , 2013 .

[5]  Didier Stricker,et al.  Introducing a New Benchmarked Dataset for Activity Monitoring , 2012, 2012 16th International Symposium on Wearable Computers.

[6]  Hisashi Kashima,et al.  Eigenspace-based anomaly detection in computer systems , 2004, KDD.

[7]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[8]  Manuel Davy,et al.  An online kernel change detection algorithm , 2005, IEEE Transactions on Signal Processing.

[9]  Michèle Basseville,et al.  Detection of abrupt changes: theory and application , 1993 .

[10]  D. Rubin The Bayesian Bootstrap , 1981 .

[11]  F. Gustafsson The marginalized likelihood ratio test for detecting abrupt changes , 1996, IEEE Trans. Autom. Control..

[12]  Peter J. Bickel,et al.  The Earth Mover's distance is the Mallows distance: some insights from statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[13]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[14]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .

[15]  Tsuyoshi Idé,et al.  Change-Point Detection using Krylov Subspace Learning , 2007, SDM.

[16]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[17]  C. Faloutsos,et al.  EVENT DETECTION IN TIME SERIES OF MOBILE COMMUNICATION GRAPHS , 2010 .

[18]  Kenji Yamanishi,et al.  A unifying framework for detecting outliers and change points from time series , 2006, IEEE Transactions on Knowledge and Data Engineering.

[19]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[20]  Hideitsu Hino,et al.  Information estimators for weighted observations , 2013, Neural Networks.

[21]  Takehisa Yairi,et al.  An approach to spacecraft anomaly detection problem using kernel feature space , 2005, KDD '05.

[22]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[23]  V. Moskvina,et al.  An Algorithm Based on Singular Spectrum Analysis for Change-Point Detection , 2003 .

[24]  Teuvo Kohonen,et al.  Learning vector quantization , 1998 .

[25]  Javier Arroyo Gallardo,et al.  Forecasting histogram time series with k-nearest neighbours methods , 2009 .

[26]  Siddhartha Bhattacharyya,et al.  Data mining for credit card fraud: A comparative study , 2011, Decis. Support Syst..