论文信息 - The Tradeoffs of Large Scale Learning - 字舞流文

The Tradeoffs of Large Scale Learning

This contribution develops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms. The analysis shows distinct tradeoffs for the case of small-scale and large-scale learning problems. Small-scale learning problems are subject to the usual approximation-estimation tradeoff. Large-scale learning problems are subject to a qualitatively different tradeoff involving the computational complexity of the underlying optimization algorithms in non-trivial ways.

Léon Bottou | Olivier Bousquet | L. Bottou | O. Bousquet

[1] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[2] Vladimir Vapnik,et al. Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[3] John E. Dennis,et al. Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[4] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[5] J. Stephen Judd,et al. On the complexity of loading shallow neural networks , 1988, J. Complex..

[6] Yann LeCun,et al. Measuring the VC-Dimension of a Learning Machine , 1994, Neural Computation.

[7] Peter L. Bartlett,et al. The importance of convexity in learning with squared loss , 1998, COLT '96.

[8] Peter L. Bartlett,et al. The Importance of Convexity in Learning with Squared Loss , 1998, IEEE Trans. Inf. Theory.

[9] Noboru Murata,et al. A Statistical Study on On-line Learning , 1999 .

[10] P. Massart. Some applications of concentration inequalities to statistics , 2000 .

[11] Sabine Buchholz,et al. Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[12] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[13] Shahar Mendelson,et al. A Few Notes on Statistical Learning Theory , 2002, Machine Learning Summer School.

[14] O. Bousquet. Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms , 2002 .

[15] Yann LeCun,et al. Large Scale Online Learning , 2003, NIPS.

[16] Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[17] Fernando Pereira,et al. Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[18] A. Tsybakov,et al. Optimal aggregation of classifiers in statistical learning , 2003 .

[19] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[20] O. Bousquet. THEORY OF CLASSIFICATION: A SURVEY OF RECENT ADVANCES , 2004 .

[21] Ingo Steinwart,et al. Fast Rates for Support Vector Machines , 2005, COLT.

[22] S. Boucheron,et al. Theory of classification : a survey of some recent advances , 2005 .

[23] P. Bartlett,et al. Empirical minimization , 2006 .

[24] Michael I. Jordan,et al. Convexity, Classification, and Risk Bounds , 2006 .

[25] Thorsten Joachims,et al. Training linear SVMs in linear time , 2006, KDD '06.

[26] V. Vapnik. Estimation of Dependences Based on Empirical Data , 2006 .

[27] Don R. Hush,et al. QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines , 2006, J. Mach. Learn. Res..

[28] Mark W. Schmidt,et al. Accelerated training of conditional random fields with stochastic gradient methods , 2006, ICML.

[29] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[30] Chih-Jen Lin,et al. Trust region Newton methods for large-scale logistic regression , 2007, ICML '07.

[31] Nathan Srebro,et al. SVM optimization: inverse dependence on training set size , 2008, ICML '08.