A Unifying View of Multiple Kernel Learning

Recent research on multiple kernel learning has lead to a number of approaches for combining kernels in regularized risk minimization. The proposed approaches include different formulations of objectives and varying regularization strategies. In this paper we present a unifying optimization criterion for multiple kernel learning and show how existing formulations are subsumed as special cases. We also derive the criterion's dual representation, which is suitable for general smooth optimization algorithms. Finally, we evaluate multiple kernel learning in this framework analytically using a Rademacher complexity bound on the generalization error and empirically in a set of experiments.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[3]  R. Showalter Monotone operators in Banach space and nonlinear partial differential equations , 1996 .

[4]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[5]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[6]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[7]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[8]  Thomas de Quincey [C] , 2000, The Works of Thomas De Quincey, Vol. 1: Writings, 1799–1820.

[9]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[10]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[11]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[12]  장윤희,et al.  Y. , 2003, Industrial and Labor Relations Terms.

[13]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[14]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[15]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[16]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[17]  Gunnar Rätsch,et al.  ARTS: accurate recognition of transcription starts in human , 2006, ISMB.

[18]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[19]  Ryan M. Rifkin,et al.  Value Regularization and Fenchel Duality , 2007, J. Mach. Learn. Res..

[20]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[21]  Peter L. Bartlett,et al.  Matrix regularization techniques for online multitask learning , 2008 .

[22]  K. R. Ramakrishnan,et al.  On the Algorithmics and Applications of a Mixed-norm based Kernel Learning Formulation , 2009, NIPS.

[23]  Shinichi Nakajima,et al.  Feature Selection for Density Level-Sets , 2009, ECML/PKDD.

[24]  Mehryar Mohri,et al.  L2 Regularization for Learning Kernels , 2009, UAI.

[25]  Yvan Saeys,et al.  Toward a gold standard for promoter prediction evaluation , 2009, Bioinform..

[26]  Klaus-Robert Müller,et al.  Efficient and Accurate Lp-Norm Multiple Kernel Learning , 2009, NIPS.

[27]  Alexander Zien,et al.  Non-Sparse Regularization and Efficient Training with Multiple Kernels , 2010, ArXiv.

[28]  Mehryar Mohri,et al.  Generalization Bounds for Learning Kernels , 2010, ICML.

[29]  Ryota Tomioka,et al.  Sparsity-accuracy trade-off in MKL , 2010, 1001.2615.

[30]  Chiranjib Bhattacharyya,et al.  Variable Sparsity Kernel Learning , 2011, J. Mach. Learn. Res..