论文信息 - Deriving and Analyzing Learning Algorithms

Deriving and Analyzing Learning Algorithms

Project Summary There is a large variety of learning problems across all disciplines waiting for the right algorithms. Many of these are on-line problems, where the learning algorithm continually makes predictions and updates its hypothesis after getting each “correct” outcome. Ef£cient algorithms may be unable to keep the entire history, and thus must compress their experience into hypotheses. This leads to a tension when the algorithm predicts incorrectly: it must correct its hypothesis in case the same instance is seen again, yet the algorithm must move cautiously to preserve its previously acquired knowledge. One way to quantify this tradeoff is to put a distance measure on the space of possible hypotheses and optimize the improvement of the prediction on the last example versus the distance moved. For the simple linear regression setting, Kivinen and Warmuth showed how two different distances lead to two radically different families of algorithms. One of these families makes additive updates to its hypothesis and includes the standard gradient descent methods. The other family makes multiplicative updates and has radically different performance. Amortized analysis techniques are used to prove relative loss bounds (similar to competitive ratios) on the algorithms, and these relative loss bounds provide a yardstick to measure the effectiveness of each learning family. Although neither family is better all of the time, the new multiplicative family performs exponentially better in many natural settings. The proposed work will extend the framework of Kivinen and Warmuth in a variety of ways. The existing setup requires a £xed learning rate that must be carefully tuned, and an important proposed direction is to analyze annealed and self-tuned learning rates. The Boosting setting is different from, but closely related to, the on-line learning setting, and the second proposed direction is to modify the framework to cover boosting problems. Most current bounds compare the loss of the algorithm against the best £xed predictor, and the third main direction of the proposal is to extend the framework so that algorithms can be compared against shifting predictors that can change over time.

Manfred K. Warmuth | D. Helmbold

[1] V. Vapnik. Estimation of Dependences Based on Empirical Data , 2006 .

[2] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[3] Eric Bauer,et al. An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[4] Manfred K. Warmuth,et al. Relative Loss Bounds for Multidimensional Regression Problems , 1997, Machine Learning.

[5] Mark Herbster,et al. Tracking the Best Expert , 1995, Machine Learning.

[6] Terrence J. Sejnowski,et al. TD(λ) Converges with Probability 1 , 1994, Machine Learning.

[7] Philip M. Long,et al. Apple Tasting , 2000, Inf. Comput..

[8] Leo Breiman,et al. Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[9] Adam Tauman Kalai,et al. On-line algorithms for combining language models , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[10] David Haussler,et al. Sequential Prediction of Individual Sequences Under General Loss Functions , 1998, IEEE Trans. Inf. Theory.

[11] Mark Herbster,et al. Tracking the best regressor , 1998, COLT' 98.

[12] Yoram Singer,et al. Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[13] Claudio Gentile,et al. Improved lower bounds for learning from noisy examples: an information-theoretic approach , 1998, COLT' 98.

[14] Yoav Freund,et al. Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[15] Andrew Tridgell,et al. KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search , 1998, ICML.

[16] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[17] Sally A. Goldman,et al. Exploring applications of learning theory to pattern matching and dynamic adjustment of tcp acknowledgement delays , 1998 .

[18] Vladimir Vovk,et al. Competitive On-line Linear Regression , 1997, NIPS.

[19] Dale Schuurmans,et al. General Convergence Results for Linear Discriminant Updates , 1997, COLT '97.

[20] Avrim Blum,et al. On-line Learning and the Metrical Task System Problem , 1997, COLT '97.

[21] Andrew R. Barron,et al. Minimax redundancy for the class of memoryless sources , 1997, IEEE Trans. Inf. Theory.