A Unified Framework for Regularization Networks and Support Vector Machines

Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples -- in particular the regression problem of approximating a multivariate function from sparse data. We present both formulations in a unified framework, namely in the context of Vapnik''s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics.

[1]  R. Courant,et al.  Methods of Mathematical Physics , 1962 .

[2]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[3]  G. Lorentz Approximation of Functions , 1966 .

[4]  V. Hutson Integral Equations , 1967, Nature.

[5]  I. J. Schoenberg,et al.  Cardinal interpolation and spline functions , 1969 .

[6]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[7]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[8]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[9]  J. A. Cochran The analysis of linear integral equations , 1973 .

[10]  David M. Allen,et al.  The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[11]  J. Stewart Positive definite functions and generalizations, an historical survey , 1976 .

[12]  V. Ivanov,et al.  The Theory of Approximate Methods and Their Application to the Numerical Solution of Singular Integr , 1978 .

[13]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[14]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[15]  L. Schumaker Spline Functions: Basic Theory , 1981 .

[16]  D. Pollard Convergence of stochastic processes , 1984 .

[17]  R. Dudley A course on empirical processes , 1984 .

[18]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[19]  B. Silverman,et al.  Spline Smoothing: The Equivalent Variable Kernel Method , 1984 .

[20]  G. Wahba A Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem , 1985 .

[21]  S. Rippa,et al.  Numerical Procedures for Surface Fitting of Scattered Data by Radial Functions , 1986 .

[22]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[23]  M. Bertero Regularization methods for linear inverse problems , 1986 .

[24]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[25]  Tomaso Poggio,et al.  Probabilistic Solution of Ill-Posed Problems in Computational Vision , 1987 .

[26]  M. Bertero,et al.  Ill-posed problems in early vision , 1988, Proc. IEEE.

[27]  G. Parisi,et al.  Statistical Field Theory , 1988 .

[28]  I. J. Schoenberg Contributions to the Problem of Approximation of Equidistant Data by Analytic Functions , 1988 .

[29]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[30]  M. Buhmann Multivariate cardinal interpolation with radial-basis functions , 1990 .

[31]  G. Wahba Spline models for observational data , 1990 .

[32]  Tomaso A. Poggio,et al.  Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[33]  W. Madych,et al.  Polyharmonic cardinal splines: a minimization property , 1990 .

[34]  C. D. Boor,et al.  Quasiinterpolants and Approximation Power of Multivariate Splines , 1990 .

[35]  Christophe Rabut,et al.  How to Build Quasi-Interpolants: Application to Polyharmonic B-Splines , 1991, Curves and Surfaces.

[36]  F. Girosi Models of Noise and Robust Estimates , 1991 .

[37]  Tomaso Poggio,et al.  Computational vision and regularization theory , 1985, Nature.

[38]  F. Girosi Models of Noise and Robust Estimation , 1991 .

[39]  R. Dudley,et al.  Uniform and universal Glivenko-Cantelli classes , 1991 .

[40]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[41]  C. Rabut AN INTRODUCTION TO SCHOENBERG'S APPROXIMATION , 1992 .

[42]  Ronald R. Coifman,et al.  Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[43]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[44]  A. Ron,et al.  On multivariate approximation by integer translates of a basis function , 1992 .

[45]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[46]  I. Daubechies Ten Lectures on Wavelets , 1992 .

[47]  M. Buhmann On quasi-interpolation with radial basis functions , 1993 .

[48]  H. Mhaskar Neural networks for localized approximation of real functions , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[49]  D. Donoho,et al.  Basis pursuit , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[50]  Wolfgang Härdle,et al.  Applied Nonparametric Regression , 1991 .

[51]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[52]  Philip M. Long,et al.  Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[53]  Terrence J. Sejnowski,et al.  Blind separation and blind deconvolution: an information-theoretic approach , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[54]  Dana Ron,et al.  An experimental and theoretical comparison of model selection methods , 1995, COLT '95.

[55]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[56]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[57]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[58]  B. Olshausen Learning linear, sparse, factorial codes , 1996 .

[59]  Federico Girosi,et al.  On the Relationship between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions , 1996, Neural Computation.

[60]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[61]  R W Prager,et al.  Development of low entropy coding in a recurrent network. , 1996, Network.

[62]  Erkki Oja,et al.  The nonlinear PCA learning rule in independent component analysis , 1997, Neurocomputing.

[63]  Bernhard Schölkopf,et al.  Prior Knowledge in Support Vector Kernels , 1997, NIPS.

[64]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[65]  Terrence J. Sejnowski,et al.  Learning Nonlinear Overcomplete Representations for Efficient Coding , 1997, NIPS.

[66]  A. J. Bell,et al.  A Unifying Information-Theoretic Framework for Independent Component Analysis , 2000 .

[67]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[68]  Bernhard Schölkopf,et al.  On a Kernel-Based Method for Pattern Recognition, Regression, Approximation, and Operator Inversion , 1998, Algorithmica.

[69]  Tomaso A. Poggio,et al.  A Sparse Representation for Function Approximation , 1998, Neural Computation.

[70]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[71]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[72]  D. Mackay,et al.  Introduction to Gaussian processes , 1998 .

[73]  N. Cristianini,et al.  Robust Bounds on Generalization from the Margin Distribution , 1998 .

[74]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[75]  John Shawe-Taylor,et al.  Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[76]  J. C. BurgesChristopher A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[77]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[78]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[79]  R. DeVore,et al.  Nonlinear approximation , 1998, Acta Numerica.

[80]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[81]  Tomaso Poggio,et al.  Incorporating prior information in machine learning by creating virtual examples , 1998, Proc. IEEE.

[82]  Massimiliano Pontil,et al.  A Note on Support Vector Machine Degeneracy , 1999, ALT.

[83]  A. J. Bell,et al.  A Unifying Information-Theoretic Framework for Independent Component Analysis , 2000 .

[84]  Tomaso A. Poggio,et al.  Sparse correlation kernel reconstruction , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[85]  Massimiliano Pontil,et al.  From regression to classification in support vector machines , 1999, ESANN.

[86]  Massimiliano Pontil,et al.  On the Vgamma Dimension for Regression in Reproducing Kernel Hilbert Spaces , 1999, ALT.

[87]  David Haussler,et al.  Probabilistic kernel regression models , 1999, AISTATS.

[88]  Olivier Chapelle,et al.  Model Selection for Support Vector Machines , 1999, NIPS.

[89]  Massimiliano Pontil,et al.  On the Noise Model of Support Vector Machines Regression , 2000, ALT.

[90]  Massimiliano Pontil,et al.  A Note on the Generalization Performance of Kernel Classifiers with Margin , 2000, ALT.

[91]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[92]  Tomaso A. Poggio,et al.  Bounds on the Generalization Performance of Kernel Machine Ensembles , 2000, ICML.

[93]  Robert A. Lordo,et al.  Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[94]  Bernhard Schölkopf,et al.  Generalization Performance of Regularization Networks and Support Vector Machines via Entropy Numbers of Compact Operators , 1998 .