Support Vector Machines for Dyadic Data

We describe a new technique for the analysis of dyadic data, where two sets of objects (row and column objects) are characterized by a matrix of numerical values that describe their mutual relationships. The new technique, called potential support vector machine (P-SVM), is a large-margin method for the construction of classifiers and regression functions for the column objects. Contrary to standard support vector machine approaches, the P-SVM minimizes a scale-invariant capacity measure and requires a new set of constraints. As a result, the P-SVM method leads to a usually sparse expansion of the classification and regression functions in terms of the row rather than the column objects and can handle data and kernel matrices that are neither positive definite nor square. We then describe two complementary regularization schemes. The first scheme improves generalization performance for classification and regression tasks; the second scheme leads to the selection of a small, informative set of row support objects and can be applied to feature selection. Benchmarks for classification, regression, and feature selection tasks are performed with toy data as well as with several real-world data sets. The results show that the new method is at least competitive with but often performs better than the benchmarked standard methods for standard vectorial as well as true dyadic data sets. In addition, a theoretical justification is provided for the new approach.

[1]  S. Hochreiter,et al.  Classification, Regression, and Feature Selection on Matrix Data , 2004 .

[2]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[3]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[4]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[5]  Andrea Califano,et al.  Analysis of Gene Expression Microarrays for Phenotype Classification , 2000, ISMB.

[6]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[7]  Klaus Obermayer,et al.  Nonlinear Feature Selection with the Potential Support Vector Machine , 2006, Feature Extraction.

[8]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[9]  C Cremer,et al.  Role of chromosome territories in the functional compartmentalization of the cell nucleus. , 1993, Cold Spring Harbor symposia on quantitative biology.

[10]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  John Shawe-Taylor,et al.  A framework for structural risk minimisation , 1996, COLT '96.

[12]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[13]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[14]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[15]  C. Breneman,et al.  Prediction of protein retention in ion-exchange systems using molecular descriptors obtained from crystal structure. , 2001, Analytical chemistry.

[16]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[17]  Olvi L. Mangasarian,et al.  Generalized Support Vector Machines , 1998 .

[18]  R. C. Williamson,et al.  Generalization Bounds via Eigenvalues of the Gram matrix , 1999 .

[19]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[20]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[21]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[22]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[23]  A D Mirzabekov,et al.  [DNA sequencing by hybridization with oligonucleotides immobilized in a gel. Chemical ligation as a method of expanding the prospects for the method]. , 1994, Molekuliarnaia biologiia.

[24]  W. Bains,et al.  A novel method for nucleic acid sequence determination. , 1988, Journal of theoretical biology.

[25]  Peter D. Hoff,et al.  Bilinear Mixed-Effects Models for Dyadic Data , 2005 .

[26]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[27]  John C. Smart,et al.  Mapping intellectual structure of a scientific subfield through author cocitations , 1990, J. Am. Soc. Inf. Sci..

[28]  Amos Bairoch,et al.  PROSITE: A Documented Database Using Patterns and Profiles as Motif Descriptors , 2002, Briefings Bioinform..

[29]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[30]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[31]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[32]  Michael I. Jordan,et al.  Unsupervised Learning from Dyadic Data , 1998 .

[33]  Klaus Obermayer,et al.  Gene Selection for Microarray Data , 2004 .

[34]  S. Elgin,et al.  Nucleosome positioning and gene regulation , 1994, Journal of cellular biochemistry.

[35]  Klaus Obermayer,et al.  Classification on Pairwise Proximity Data , 1998, NIPS.

[36]  Ronald Rousseau,et al.  Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient , 2003, J. Assoc. Inf. Sci. Technol..

[37]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[38]  Amos Bairoch,et al.  The PROSITE database, its status in 2002 , 2002, Nucleic Acids Res..

[39]  R. Drmanac,et al.  Sequencing of megabase plus DNA by hybridization: theory of the method. , 1989, Genomics.

[40]  D. Botstein,et al.  A gene expression database for the molecular pharmacology of cancer , 2000, Nature Genetics.

[41]  Wei Chu,et al.  Bayesian support vector regression using a unified loss function , 2004, IEEE Transactions on Neural Networks.

[42]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[43]  E. Loken,et al.  A UNIFIED THEORY OF STATISTICAL ANALYSIS AND INFERENCE FOR VARIANCE COMPONENT MODELS FOR DYADIC DATA , 2002 .

[44]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[45]  K. Khrapko,et al.  [Determination of the nucleotide sequence of DNA using hybridization with oligonucleotides. A new method]. , 1988, Doklady Akademii nauk SSSR.

[46]  W. Klein,et al.  Bibliometrics , 2005, Social work in health care.

[47]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[48]  Klaus Obermayer,et al.  Coulomb Classifiers: Generalizing Support Vector Machines via an Analogy to Electrostatic Systems , 2002, NIPS.