论文信息 - A Unified Framework for Regularization Networks and Support Vector Machines

A Unified Framework for Regularization Networks and Support Vector Machines

Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples -- in particular the regression problem of approximating a multivariate function from sparse data. We present both formulations in a unified framework, namely in the context of Vapnik''s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics.

[1] R. Courant,et al. Methods of Mathematical Physics , 1962 .

[2] N. Aronszajn. Theory of Reproducing Kernels. , 1950 .

[3] G. Lorentz. Approximation of Functions , 1966 .

[4] V. Hutson. Integral Equations , 1967, Nature.

[5] I. J. Schoenberg,et al. Cardinal interpolation and spline functions , 1969 .

[6] G. Wahba,et al. A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[7] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[8] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[9] J. A. Cochran. The analysis of linear integral equations , 1973 .

[10] David M. Allen,et al. The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[11] J. Stewart. Positive definite functions and generalizations, an historical survey , 1976 .

[12] V. Ivanov,et al. The Theory of Approximate Methods and Their Application to the Numerical Solution of Singular Integr , 1978 .

[13] A. N. Tikhonov,et al. Solutions of ill-posed problems , 1977 .

[14] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..

[15] L. Schumaker. Spline Functions: Basic Theory , 1981 .

[16] D. Pollard. Convergence of stochastic processes , 1984 .

[17] R. Dudley. A course on empirical processes , 1984 .

[18] Leslie G. Valiant,et al. A theory of the learnable , 1984, CACM.

[19] B. Silverman,et al. Spline Smoothing: The Equivalent Variable Kernel Method , 1984 .

[20] G. Wahba. A Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem , 1985 .

[21] S. Rippa,et al. Numerical Procedures for Surface Fitting of Scattered Data by Radial Functions , 1986 .

[22] R. Tibshirani,et al. Generalized additive models for medical research , 1986, Statistical methods in medical research.

[23] M. Bertero. Regularization methods for linear inverse problems , 1986 .

[24] C. Micchelli. Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[25] Tomaso Poggio,et al. Probabilistic Solution of Ill-Posed Problems in Computational Vision , 1987 .

[26] M. Bertero,et al. Ill-posed problems in early vision , 1988, Proc. IEEE.

[27] G. Parisi,et al. Statistical Field Theory , 1988 .

[28] I. J. Schoenberg. Contributions to the Problem of Approximation of Equidistant Data by Analytic Functions , 1988 .

[29] F. Girosi,et al. Networks for approximation and learning , 1990, Proc. IEEE.

[30] M. Buhmann. Multivariate cardinal interpolation with radial-basis functions , 1990 .

[31] G. Wahba. Spline models for observational data , 1990 .

[32] Tomaso A. Poggio,et al. Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[33] W. Madych,et al. Polyharmonic cardinal splines: a minimization property , 1990 .

[34] C. D. Boor,et al. Quasiinterpolants and Approximation Power of Multivariate Splines , 1990 .

[35] Christophe Rabut,et al. How to Build Quasi-Interpolants: Application to Polyharmonic B-Splines , 1991, Curves and Surfaces.

[36] F. Girosi. Models of Noise and Robust Estimates , 1991 .

[37] Tomaso Poggio,et al. Computational vision and regularization theory , 1985, Nature.

[38] F. Girosi. Models of Noise and Robust Estimation , 1991 .

[39] R. Dudley,et al. Uniform and universal Glivenko-Cantelli classes , 1991 .

[40] Léon Bottou,et al. Local Learning Algorithms , 1992, Neural Computation.

[41] C. Rabut. AN INTRODUCTION TO SCHOENBERG'S APPROXIMATION , 1992 .

[42] Ronald R. Coifman,et al. Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[43] Yann LeCun,et al. Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[44] A. Ron,et al. On multivariate approximation by integer translates of a basis function , 1992 .

[45] Stéphane Mallat,et al. Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[46] I. Daubechies. Ten Lectures on Wavelets , 1992 .

[47] M. Buhmann. On quasi-interpolation with radial basis functions , 1993 .

[48] H. Mhaskar. Neural networks for localized approximation of real functions , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[49] D. Donoho,et al. Basis pursuit , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[50] Wolfgang Härdle,et al. Applied Nonparametric Regression , 1991 .

[51] Robert E. Schapire,et al. Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[52] Philip M. Long,et al. Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[53] Terrence J. Sejnowski,et al. Blind separation and blind deconvolution: an information-theoretic approach , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[54] Dana Ron,et al. An experimental and theoretical comparison of model selection methods , 1995, COLT '95.

[55] Tomaso A. Poggio,et al. Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[56] Andrzej Cichocki,et al. A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[57] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[58] B. Olshausen. Learning linear, sparse, factorial codes , 1996 .

[59] Federico Girosi,et al. On the Relationship between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions , 1996, Neural Computation.

[60] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[61] R W Prager,et al. Development of low entropy coding in a recurrent network. , 1996, Network.

[62] Erkki Oja,et al. The nonlinear PCA learning rule in independent component analysis , 1997, Neurocomputing.

[63] Bernhard Schölkopf,et al. Prior Knowledge in Support Vector Kernels , 1997, NIPS.

[64] Noga Alon,et al. Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[65] Terrence J. Sejnowski,et al. Learning Nonlinear Overcomplete Representations for Efficient Coding , 1997, NIPS.

[66] A. J. Bell,et al. A Unifying Information-Theoretic Framework for Independent Component Analysis , 2000 .

[67] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[68] Bernhard Schölkopf,et al. On a Kernel-Based Method for Pattern Recognition, Regression, Approximation, and Operator Inversion , 1998, Algorithmica.

[69] Tomaso A. Poggio,et al. A Sparse Representation for Function Approximation , 1998, Neural Computation.

[70] John Shawe-Taylor,et al. Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[71] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[72] D. Mackay,et al. Introduction to Gaussian processes , 1998 .

[73] N. Cristianini,et al. Robust Bounds on Generalization from the Margin Distribution , 1998 .

[74] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[75] John Shawe-Taylor,et al. Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[76] J. C. BurgesChristopher. A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[77] Simon Haykin,et al. Neural Networks: A Comprehensive Foundation , 1998 .

[78] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[79] R. DeVore,et al. Nonlinear approximation , 1998, Acta Numerica.

[80] Federico Girosi,et al. An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[81] Tomaso Poggio,et al. Incorporating prior information in machine learning by creating virtual examples , 1998, Proc. IEEE.

[82] Massimiliano Pontil,et al. A Note on Support Vector Machine Degeneracy , 1999, ALT.

[83] A. J. Bell,et al. A Unifying Information-Theoretic Framework for Independent Component Analysis , 2000 .

[84] Tomaso A. Poggio,et al. Sparse correlation kernel reconstruction , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[85] Massimiliano Pontil,et al. From regression to classification in support vector machines , 1999, ESANN.

[86] Massimiliano Pontil,et al. On the Vgamma Dimension for Regression in Reproducing Kernel Hilbert Spaces , 1999, ALT.

[87] David Haussler,et al. Probabilistic kernel regression models , 1999, AISTATS.

[88] Olivier Chapelle,et al. Model Selection for Support Vector Machines , 1999, NIPS.

[89] Massimiliano Pontil,et al. On the Noise Model of Support Vector Machines Regression , 2000, ALT.

[90] Massimiliano Pontil,et al. A Note on the Generalization Performance of Kernel Classifiers with Margin , 2000, ALT.

[91] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[92] Tomaso A. Poggio,et al. Bounds on the Generalization Performance of Kernel Machine Ensembles , 2000, ICML.

[93] Robert A. Lordo,et al. Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[94] Bernhard Schölkopf,et al. Generalization Performance of Regularization Networks and Support Vector Machines via Entropy Numbers of Compact Operators , 1998 .