Multiple Kernel Learning with Gaussianity Measures

Kernel methods are known to be effective for nonlinear multivariate analysis. One of the main issues in the practical use of kernel methods is the selection of kernel. There have been a lot of studies on kernel selection and kernel learning. Multiple kernel learning (MKL) is one of the promising kernel optimization approaches. Kernel methods are applied to various classifiers including Fisher discriminant analysis (FDA). FDA gives the Bayes optimal classification axis if the data distribution of each class in the feature space is a gaussian with a shared covariance structure. Based on this fact, an MKL framework based on the notion of gaussianity is proposed. As a concrete implementation, an empirical characteristic function is adopted to measure gaussianity in the feature space associated with a convex combination of kernel functions, and two MKL algorithms are derived. From experimental results on some data sets, we show that the proposed kernel learning followed by FDA offers strong classification power.

[1]  Gunnar Rätsch,et al.  Invariant Feature Extraction and Classification in Kernel Spaces , 1999, NIPS.

[2]  Yves Grandvalet,et al.  Y.: SimpleMKL , 2008 .

[3]  Stephen P. Boyd,et al.  Optimal kernel selection in Kernel Fisher discriminant analysis , 2006, ICML.

[4]  Nima Reyhani Multiple Spectral Kernel Learning and a Gaussian Complexity Computation , 2013, Neural Computation.

[5]  Josef Kittler,et al.  Non-sparse Multiple Kernel Learning for Fisher Discriminant Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[6]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[7]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[8]  M. Talagrand,et al.  Probability in Banach spaces , 1991 .

[9]  Taiji Suzuki,et al.  SpicyMKL: a fast algorithm for Multiple Kernel Learning with thousands of kernels , 2011, Machine Learning.

[10]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[11]  Kei Takeuchi,et al.  The studentized empirical characteristic function and its application to test for the shape of distribution , 1981 .

[12]  Si Wu,et al.  Improving support vector machine classifiers by modifying kernel functions , 1999, Neural Networks.

[13]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2006 .

[14]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[15]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[16]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[17]  Olivier Bousquet,et al.  On the Complexity of Learning the Kernel Matrix , 2002, NIPS.

[18]  Jinbo Bi,et al.  Column-generation boosting methods for mixture of kernels , 2004, KDD.

[19]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[20]  Kristin P. Bennett,et al.  MARK: a boosting algorithm for heterogeneous kernel models , 2002, KDD.

[21]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[22]  A. J. Stam Some Inequalities Satisfied by the Quantities of Information of Fisher and Shannon , 1959, Inf. Control..

[23]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[24]  M. Omair Ahmad,et al.  Optimizing the kernel in the empirical feature space , 2005, IEEE Transactions on Neural Networks.

[25]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[26]  Melanie Hilario,et al.  Margin and Radius Based Multiple Kernel Learning , 2009, ECML/PKDD.

[27]  Noboru Murata,et al.  PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE , 2011 .

[28]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[29]  Kenneth O. Kortanek,et al.  Semi-Infinite Programming: Theory, Methods, and Applications , 1993, SIAM Rev..

[30]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[31]  A. Feuerverger,et al.  The Empirical Characteristic Function and Its Applications , 1977 .

[32]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[33]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[34]  Ioannis A. Koutrouvelis,et al.  A goodness-of-fit test of simple hypotheses based on the empirical characteristic function , 1980 .

[35]  E. Lukács CERTAIN ENTIRE CHARACTERISTIC FUNCTIONS , 2005 .

[36]  M. Kloft,et al.  Non-sparse Multiple Kernel Learning , 2008 .

[37]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[38]  Visa Koivunen,et al.  Characteristic-function-based independent component analysis , 2003, Signal Process..

[39]  David G. Stork,et al.  Pattern Classification , 1973 .

[40]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[41]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[42]  Koby Crammer,et al.  Kernel Design Using Boosting , 2002, NIPS.