Regularization Networks and Support Vector Machines

Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular, the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik's theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.

[1]  I. J. Schoenberg Contributions to the problem of approximation of equidistant data by analytic functions. Part A. On the problem of smoothing or graduation. A first class of analytic approximation formulae , 1946 .

[2]  Dr. M. G. Worster Methods of Mathematical Physics , 1947, Nature.

[3]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[4]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[5]  L. Goddard Approximation of Functions , 1965, Nature.

[6]  V. Hutson Integral Equations , 1967, Nature.

[7]  I. J. Schoenberg,et al.  Cardinal interpolation and spline functions , 1969 .

[8]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[9]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[10]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[11]  J. A. Cochran The analysis of linear integral equations , 1973 .

[12]  David M. Allen,et al.  The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[13]  J. Stewart Positive definite functions and generalizations, an historical survey , 1976 .

[14]  V. Ivanov,et al.  The Theory of Approximate Methods and Their Application to the Numerical Solution of Singular Integr , 1978 .

[15]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[16]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[17]  L. Schumaker Spline Functions: Basic Theory , 1981 .

[18]  J. Jerome Review: Larry L. Schumaker, Spline functions: Basic theory , 1982 .

[19]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[20]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[21]  R. Dudley A course on empirical processes , 1984 .

[22]  B. Silverman,et al.  Spline Smoothing: The Equivalent Variable Kernel Method , 1984 .

[23]  Tomaso Poggio,et al.  Computational vision and regularization theory , 1985, Nature.

[24]  G. Wahba A Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem , 1985 .

[25]  S. Rippa,et al.  Numerical Procedures for Surface Fitting of Scattered Data by Radial Functions , 1986 .

[26]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[27]  M. Bertero Regularization methods for linear inverse problems , 1986 .

[28]  Tomaso Poggio,et al.  Probabilistic Solution of Ill-Posed Problems in Computational Vision , 1987 .

[29]  M. Bertero,et al.  Ill-posed problems in early vision , 1988, Proc. IEEE.

[30]  I. J. Schoenberg Contributions to the Problem of Approximation of Equidistant Data by Analytic Functions , 1988 .

[31]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[32]  M. Buhmann Multivariate cardinal interpolation with radial-basis functions , 1990 .

[33]  W. Härdle Applied Nonparametric Regression , 1991 .

[34]  Tomaso A. Poggio,et al.  Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[35]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[36]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[37]  W. Madych,et al.  Polyharmonic cardinal splines: a minimization property , 1990 .

[38]  C. D. Boor Quasiinterpolants and Approximation Power of Multivariate Splines , 1990 .

[39]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[40]  Christophe Rabut,et al.  How to Build Quasi-Interpolants: Application to Polyharmonic B-Splines , 1991, Curves and Surfaces.

[41]  F. Girosi Models of Noise and Robust Estimates , 1991 .

[42]  F. Girosi Models of Noise and Robust Estimation , 1991 .

[43]  R. Dudley,et al.  Uniform and universal Glivenko-Cantelli classes , 1991 .

[44]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[45]  C. Rabut AN INTRODUCTION TO SCHOENBERG'S APPROXIMATION , 1992 .

[46]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[47]  Ronald R. Coifman,et al.  Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[48]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[49]  A. Ron,et al.  On multivariate approximation by integer translates of a basis function , 1992 .

[50]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[51]  M. Buhmann On quasi-interpolation with radial basis functions , 1993 .

[52]  H. Mhaskar Neural networks for localized approximation of real functions , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[53]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[54]  D. Donoho,et al.  Basis pursuit , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[55]  Philip M. Long,et al.  Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[56]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[57]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[58]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[59]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[60]  B. Olshausen Learning linear, sparse, factorial codes , 1996 .

[61]  Federico Girosi,et al.  On the Relationship between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions , 1996, Neural Computation.

[62]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[63]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[64]  R W Prager,et al.  Development of low entropy coding in a recurrent network. , 1996, Network.

[65]  Erkki Oja,et al.  The nonlinear PCA learning rule in independent component analysis , 1997, Neurocomputing.

[66]  Bernhard Schölkopf,et al.  Prior Knowledge in Support Vector Kernels , 1997, NIPS.

[67]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[68]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[69]  Terrence J. Sejnowski,et al.  Learning Nonlinear Overcomplete Representations for Efficient Coding , 1997, NIPS.

[70]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[71]  Bernhard Schölkopf,et al.  On a Kernel-Based Method for Pattern Recognition, Regression, Approximation, and Operator Inversion , 1998, Algorithmica.

[72]  Tomaso A. Poggio,et al.  A Sparse Representation for Function Approximation , 1998, Neural Computation.

[73]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[74]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[75]  D. Mackay,et al.  Introduction to Gaussian processes , 1998 .

[76]  N. Cristianini,et al.  Robust Bounds on Generalization from the Margin Distribution , 1998 .

[77]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[78]  John Shawe-Taylor,et al.  Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[79]  J. C. BurgesChristopher A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[80]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[81]  R. DeVore,et al.  Nonlinear approximation , 1998, Acta Numerica.

[82]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[83]  Tomaso Poggio,et al.  Incorporating prior information in machine learning by creating virtual examples , 1998, Proc. IEEE.

[84]  Massimiliano Pontil,et al.  A Note on Support Vector Machine Degeneracy , 1999, ALT.

[85]  A. J. Bell,et al.  A Unifying Information-Theoretic Framework for Independent Component Analysis , 2000 .

[86]  Massimiliano Pontil,et al.  From regression to classification in support vector machines , 1999, ESANN.

[87]  M. Pontil,et al.  From Regression to Classication in Support Vector Machines , 1999 .

[88]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[89]  Massimiliano Pontil,et al.  On the Vgamma Dimension for Regression in Reproducing Kernel Hilbert Spaces , 1999, ALT.

[90]  David Haussler,et al.  Probabilistic kernel regression models , 1999, AISTATS.

[91]  Olivier Chapelle,et al.  Model Selection for Support Vector Machines , 1999, NIPS.

[92]  Massimiliano Pontil,et al.  On the Noise Model of Support Vector Machines Regression , 2000, ALT.

[93]  Massimiliano Pontil,et al.  A Note on the Generalization Performance of Kernel Classifiers with Margin , 2000, ALT.

[94]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[95]  Tomaso A. Poggio,et al.  Bounds on the Generalization Performance of Kernel Machine Ensembles , 2000, ICML.

[96]  Robert A. Lordo,et al.  Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[97]  Bernhard Schölkopf,et al.  Generalization Performance of Regularization Networks and Support Vector Machines via Entropy Numbers of Compact Operators , 1998 .

[98]  F. Girosi,et al.  On the Relationship between Generalization Error , Hypothesis NG 1879 Complexity , and Sample Complexity for Radial Basis Functions N 00014-92-J-1879 6 , 2022 .