Segment and combine: a generic approach for supervised learning of invariant classifiers from topologically structured data

A generic method for supervised classification of structured objects is presented. The approach induces a classifier by (i) deriving a surrogate dataset from a pre-classified dataset of structured objects, by segmenting them into pieces, (ii) learning a model relating pieces to object-classes, (iii) classifying structured objects by combining predictions made for their pieces. The segmentation allows to exploit local information and can be adapted to inject invariances into the resulting classifier. The framework is illustrated on practical sequence, time-series and image classification problems.

[1]  Hui Zhang,et al.  A Non-parametric Wavelet Feature Extractor for Time Series Classification , 2004, PAKDD.

[2]  Pierre Geurts,et al.  Segment and Combine Approach for Non-parametric Time-Series Classification , 2005, PKDD.

[3]  Carlos J. Alonso,et al.  Boosting Interval-Based Literals: Variable Length and Early Classification , 2002 .

[4]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[5]  Mineichi Kudo,et al.  Multidimensional curve classification using passing-through regions , 1999, Pattern Recognit. Lett..

[6]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[7]  Katharina Morik,et al.  Automatic Feature Extraction for Classifying Audio Data , 2005, Machine Learning.

[8]  Raphaël Marée,et al.  Random subwindows for robust image classification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[10]  Mohammed Waleed Kadous,et al.  Learning Comprehensible Descriptions of Multivariate Time Series , 1999, ICML.

[11]  Shoshana J. Wodak,et al.  Combining pattern discovery and discriminant analysis to predict gene co-regulation , 2004, Bioinform..

[12]  Claude Sammut,et al.  Classification of Multivariate Time Series and Structured Data Using Constructive Induction , 2005, Machine Learning.

[13]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Pierre Geurts,et al.  Pattern Extraction for Time Series Classification , 2001, PKDD.

[15]  R. Coifman,et al.  Local feature extraction and its applications using a library of bases , 1994 .

[16]  Daniel P. Siewiorek,et al.  Generalized feature extraction for structural pattern recognition in time-series data , 2001 .

[17]  Shigeki Sagayama,et al.  Dynamic Time-Alignment Kernel in Support Vector Machine , 2001, NIPS.

[18]  Einoshin Suzuki,et al.  Decision-tree Induction from Time-series Data Based on a Standard-example Split Test , 2003, ICML.

[19]  Yuh-Jyh Hu,et al.  Combinatorial motif analysis and hypothesis generation on a genomic scale , 2000, Bioinform..

[20]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[21]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[22]  Raphaël Marée,et al.  Biomedical Image Classification with Random Subwindows and Decision Trees , 2005, CVBIA.

[23]  Yvan Saeys,et al.  Feature selection for splice site prediction: A new method using EDA-based feature ranking , 2004, BMC Bioinformatics.

[24]  Thomas G. Dietterich The Divide-and-Conquer Manifesto , 2000, ALT.

[25]  Pierre Geurts,et al.  Segment and Combine Approach for Biological Sequence Classification , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[26]  Dennis Shasha,et al.  New Techniques for DNA Sequence Classification , 1999, J. Comput. Biol..

[27]  Eamonn J. Keogh,et al.  Making Time-Series Classification More Accurate Using Learned Constraints , 2004, SDM.