Covering numbers for support vector machines

Support vector (SV) machines are linear classifiers that use the maximum margin hyperplane in a feature space defined by a kernel function. Until recently, the only bounds on the generalization performance of SV machines (within Valiant's probably approximately correct framework) took no account of the kernel used except in its effect on the margin and radius. More recently, it has been shown that one can bound the relevant covering numbers using tools from functional analysis. In this paper, we show that the resulting bound can be greatly simplified. The new bound involves the eigenvalues of the integral operator induced by the kernel. It shows that the effective dimension depends on the rate of decay of these eigenvalues. We present an explicit calculation of covering numbers for an SV machine using a Gaussian kernel, which is significantly better than that implied by previous results.

[1]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[2]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[3]  R. A. Silverman,et al.  Introductory Real Analysis , 1972 .

[4]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[5]  H. König Eigenvalue Distribution of Compact Operators , 1986 .

[6]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[7]  Martin Anthony,et al.  Probabilistic Analysis of Learning in Artificial Neural Networks: The PAC Model and its Variants , 1994 .

[8]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[9]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[10]  Bernhard Schölkopf,et al.  The connection between regularization operators and support vector kernels , 1998, Neural Networks.

[11]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[12]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[13]  Nello Cristianini,et al.  Margin Distribution Bounds on Generalization , 1999, EuroCOLT.

[14]  Peter L. Bartlett,et al.  Learning in Neural Networks: Theoretical Foundations , 1999 .

[15]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[16]  Bernhard Schölkopf,et al.  Entropy Numbers, Operators and Support Vector Kernels , 1999, EuroCOLT.

[17]  Nello Cristianini,et al.  Generalization Performance of Classifiers in Terms of Observed Covering Numbers , 1999, EuroCOLT.

[18]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[19]  Bernhard Schölkopf,et al.  Generalization Performance of Regularization Networks and Support Vector Machines via Entropy Numbers of Compact Operators , 1998 .

[20]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.