Additive versus exponentiated gradient updates for linear prediction
暂无分享,去创建一个
[1] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.
[2] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .
[3] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.
[4] Bernard Widrow,et al. Adaptive Signal Processing , 1985 .
[5] David Haussler,et al. Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.
[6] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.
[7] N. Littlestone. Mistake bounds and logarithmic linear-threshold learning algorithms , 1990 .
[8] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.
[9] David Haussler,et al. How to use expert advice , 1993, STOC.
[10] Philip M. Long,et al. WORST-CASE QUADRATIC LOSS BOUNDS FOR ON-LINE PREDICTION OF LINEAR FUNCTIONS BY GRADIENT DESCENT , 1993 .
[11] S. Hyakin,et al. Neural Networks: A Comprehensive Foundation , 1994 .
[12] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..
[13] David Haussler,et al. Tight worst-case loss bounds for predicting with expert advice , 1994, EuroCOLT.
[14] Shun-ichi Amari,et al. Information geometry of the EM and em algorithms for neural networks , 1995, Neural Networks.
[15] Shun-ichi Amari,et al. The EM Algorithm and Information Geometry in Neural Network Learning , 1995, Neural Computation.
[16] Manfred K. Warmuth,et al. The perceptron algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant , 1995, COLT '95.
[17] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..