Modeling Position Specificity in Sequence Kernels by Fuzzy Equivalence Relations

This paper demonstrates that several known sequence kernels can be expressed in a unified framework in which the position specificity is modeled by fuzzy equivalence relations. In addition to this interpretation, we address the practical issues of positive semi- definiteness, computational complexity, and the extraction of inter- pretable features from the final support vector machine classifier. Keywords— fuzzy equivalence relation, kernel, sequence classi- fication, support vector machines.

[1]  B. Schölkopf,et al.  Accurate Splice Site Detection for Caenorhabditis elegans , 2004 .

[2]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[3]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[4]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[5]  William Stafford Noble,et al.  Nucleosome positioning signals in genomic DNA. , 2007, Genome research.

[6]  Ronald W. Davis,et al.  A high-resolution atlas of nucleosome occupancy in yeast , 2007, Nature Genetics.

[7]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[8]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[9]  A C C Gibbs,et al.  Data Analysis , 2009, Encyclopedia of Database Systems.

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[12]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[13]  Douglas L. Brutlag,et al.  Remote homology detection: a motif based approach , 2003, ISMB.

[14]  B. Baets,et al.  Pseudo-metrics and T-equivalences , 1997 .

[15]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[16]  Rainer Merkl,et al.  Oligo kernels for datamining on biological sequences: a case study on prokaryotic translation initiation sites , 2004, BMC Bioinformatics.

[17]  C. Micchelli,et al.  Functions that preserve families of positive semidefinite matrices , 1995 .

[18]  Gunnar Rätsch,et al.  POIMs: positional oligomer importance matrices—understanding support vector machine-based signal detectors , 2008, ISMB.

[19]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[20]  Irene K. Moore,et al.  A genomic code for nucleosome positioning , 2006, Nature.

[21]  L. Valverde On the structure of F-indistinguishability operators , 1985 .

[22]  Peter Meinicke,et al.  Remote homology detection based on oligomer distances , 2006, Bioinform..

[23]  V. Pavlovic,et al.  A fast , large-scale learning method for protein sequence classification , 2008 .

[24]  Lotfi A. Zadeh,et al.  Similarity relations and fuzzy orderings , 1971, Inf. Sci..

[25]  Lluís A. Belanche Muñoz,et al.  Distance-Based Kernels for Real-Valued Data , 2007, GfKl.

[26]  Gunnar Rätsch,et al.  RASE: recognition of alternatively spliced exons in C.elegans , 2005, ISMB.

[27]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .