Classification, Regression, and Feature Selection

We describe a new technique for the analysis of data which is given in matrix form. We consider two sets of objects, the “row” and the “column” objects, and we represent these objects by a matrix of numerical values which describe their mutual relationships. We then introduce a new technique, the “Potential Support Vector Machine” (P-SVM), as a large-margin based method for the construction of classifiers and regression functions for the “column” objects. Contrary to standard support vector machine (SVM) approaches, the P-SVM minimizes a scale-invariant capacity measure under a new set of constraints. As a result, the P-SVM can handle data matrices which are neither positive definite nor square, and leads to a usually sparse expansion of the classification boundary or the regression function in terms of the “row” rather than the “column” objects. We introduce two complementary regularization schemes in order to avoid overfitting for noisy data sets. The first scheme improves generalization performance for classification and regression problems, the second scheme leads to the selection of a small and informative set of “row” objects and can be applied to feature selection. A fast optimization algorithm based on the “Sequential Minimal Optimization” (SMO) technique is provided. We first apply the new method to two kinds of data representation. The first representation uses a vectorial representation for the “row” and the “column” objects, and constructs a Gram matrix from feature vectors using a kernel function. Benchmark results show, that the P-SVM method is competitive with or provides superior classification and regression results compared to standard methods and has the additional advantage that the kernel functions are no longer restricted to be positive definite. The second representation uses a measured matrix of mutual relations between objects rather than vectorial data. The new classification

[1]  M. Kendall,et al.  The advanced theory of statistics , 1945 .

[2]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[3]  D. Botstein,et al.  A gene expression database for the molecular pharmacology of cancer , 2000, Nature Genetics.

[4]  Volker Roth,et al.  Nonlinear Discriminant Analysis Using Kernel Functions , 1999, NIPS.

[5]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[6]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[7]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[8]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[9]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[10]  John Shawe-Taylor,et al.  A framework for structural risk minimisation , 1996, COLT '96.

[11]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[12]  Ronald Rousseau,et al.  Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient , 2003, J. Assoc. Inf. Sci. Technol..

[13]  Amos Bairoch,et al.  The PROSITE database, its status in 2002 , 2002, Nucleic Acids Res..

[14]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[15]  R. Drmanac,et al.  Sequencing of megabase plus DNA by hybridization: theory of the method. , 1989, Genomics.

[16]  H. Luetkepohl The Handbook of Matrices , 1996 .

[17]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[18]  C. Breneman,et al.  Prediction of protein retention in ion-exchange systems using molecular descriptors obtained from crystal structure. , 2001, Analytical chemistry.

[19]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[20]  John C. Smart,et al.  Mapping intellectual structure of a scientific subfield through author cocitations , 1990, J. Am. Soc. Inf. Sci..

[21]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[22]  Klaus Obermayer,et al.  Gene Selection for Microarray Data , 2004 .

[23]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[24]  R. C. Williamson,et al.  Generalization Bounds via Eigenvalues of the Gram matrix , 1999 .

[25]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[26]  S. Elgin,et al.  Nucleosome positioning and gene regulation , 1994, Journal of cellular biochemistry.

[27]  Olvi L. Mangasarian,et al.  Generalized Support Vector Machines , 1998 .

[28]  Andrea Califano,et al.  Analysis of Gene Expression Microarrays for Phenotype Classification , 2000, ISMB.

[29]  Amos Bairoch,et al.  PROSITE: A Documented Database Using Patterns and Profiles as Motif Descriptors , 2002, Briefings Bioinform..

[30]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[31]  Wei Chu,et al.  Bayesian support vector regression using a unified loss function , 2004, IEEE Transactions on Neural Networks.

[32]  C. Blakemore,et al.  Analysis of connectivity in the cat cerebral cortex , 1995, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[33]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[34]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[35]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[36]  W. Bains,et al.  A novel method for nucleic acid sequence determination. , 1988, Journal of theoretical biology.

[37]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[39]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[40]  Klaus Obermayer,et al.  Coulomb Classifiers: Generalizing Support Vector Machines via an Analogy to Electrostatic Systems , 2002, NIPS.

[41]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[42]  Klaus Obermayer,et al.  Classification on Pairwise Proximity Data , 1998, NIPS.

[43]  K. Khrapko,et al.  [Determination of the nucleotide sequence of DNA using hybridization with oligonucleotides. A new method]. , 1988, Doklady Akademii nauk SSSR.

[44]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.