Nature-inspired approaches for distance metric learning in multivariate time series classification

The applicability of time series data mining in many different fields has motivated the scientific community to focus on the development of new methods towards improving the performance of the classifiers over this particular class of data. In this context the related literature has extensively shown that dynamic time warping is the similarity measure of choice when univariate time series are considered. However, possible statistical coupling among different dimensions make the generalization of this metric to the multivariate case all but obvious. This has ignited the interest of the community in new distance definitions capable of capturing such inter-dimension dependences. In this paper we propose a simple dynamic time warping based distance that finds the best weighted combination between the dependent - where multivariate time series are treated as whole - and independent approaches - where multivariate time series are just a collection of unrelated univariate time series - of the time series to be classified. A benchmark of four heuristic wrappers, namely, simulated annealing, particle swarm optimization, estimation of distribution algorithms and genetic algorithms are used to evolve the set of weighting coefficients towards maximizing the cross-validated predictive score of the classifiers. In this context one of the most recurring classifiers is nearest neighbor. This classifier is couple with a distance that as afore mentioned, in most cases, have been dynamic time warping. The performance of the proposed approach is validated over datasets widely utilized in the related literature, from which it is concluded that the obtained performance gains can be enlarged by properly decoupling the influence of each dimension in the definition of the dependent dynamic time warping distance.

[1]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[2]  Jun Wang,et al.  Generalizing DTW to the multi-dimensional case requires an adaptive approach , 2016, Data Mining and Knowledge Discovery.

[3]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[4]  Jian Pei,et al.  A brief survey on sequence classification , 2010, SKDD.

[5]  Farookh Khadeer Hussain,et al.  Support vector regression with chaos-based firefly algorithm for stock market price forecasting , 2013, Appl. Soft Comput..

[6]  George Manis,et al.  Heartbeat Time Series Classification With Support Vector Machines , 2009, IEEE Transactions on Information Technology in Biomedicine.

[7]  B. Prabhakaran,et al.  Word Recognition from Continuous Articulatory Movement Time-series Data using Symbolic Representations , 2013, SLPAT.

[8]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[9]  Ya-Ju Fan,et al.  On the Time Series $K$-Nearest Neighbor Classification of Abnormal Brain Activity , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[10]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[11]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[12]  José Antonio Lozano,et al.  A general framework for the statistical analysis of the sources of variance for classification error estimators , 2013, Pattern Recognit..

[13]  Yuan-Fang Wang,et al.  Learning a Mahalanobis Distance-Based Dynamic Time Warping Measure for Multivariate Time Series Classification , 2016, IEEE Transactions on Cybernetics.

[14]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[15]  M Congedo,et al.  A review of classification algorithms for EEG-based brain–computer interfaces , 2007, Journal of neural engineering.

[16]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[17]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[18]  Tomasz Górecki,et al.  Multivariate time series classification with parametric derivative dynamic time warping , 2015, Expert Syst. Appl..

[19]  Mohammed Waleed Kadous,et al.  Temporal classification: extending the classification paradigm to multivariate time series , 2002 .

[20]  Eamonn J. Keogh,et al.  Generalizing Dynamic Time Warping to the Multi-Dimensional Case Requires an Adaptive Approach , 2014 .

[21]  Michael L. Raymer,et al.  GA-facilitated KNN classifier optimization with varying similarity measures , 2005, 2005 IEEE Congress on Evolutionary Computation.

[22]  Theresa L. Utlaut,et al.  Introduction to Time Series Analysis and Forecasting , 2008 .

[23]  Darrell Whitley,et al.  A genetic algorithm tutorial , 1994, Statistics and Computing.

[24]  János Abonyi,et al.  Correlation based dynamic time warping of multivariate time series , 2012, Expert Syst. Appl..

[25]  G. W. Hughes,et al.  Minimum Prediction Residual Principle Applied to Speech Recognition , 1975 .