The weighted majority algorithm

The construction of prediction algorithms in a situation in which a learner faces a sequence of trials, with a prediction to be made in each, and the goal of the learner is to make few mistakes is studied. It is assumed that the learner has reason to believe that one of some pool of known algorithms will perform well but does not know which one. A simple and effective method, based on weighted voting, is introduced for constructing a compound algorithm in such a circumstance. It is called the weighted majority algorithm and is shown to be robust with respect to errors in the data. Various versions of the weighted majority algorithm are discussed, and error bounds for them that are closely related to the error bounds of the best algorithms of the pool are proved.<<ETX>>

[1]  Esther Levin,et al.  A statistical approach to learning and generalization in layered neural networks , 1989, Proc. IEEE.

[2]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[3]  Manfred K. Warmuth,et al.  Learning Nested Differences of Intersection-Closed Concept Classes , 1989, COLT '89.

[4]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[5]  Wolfgang Maass,et al.  On-line learning with an oblivious environment and the power of randomization , 1991, COLT '91.

[6]  Alfredo De Santis,et al.  Learning probabilistic prediction functions , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[7]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[8]  Nick Littlestone,et al.  From on-line to batch learning , 1989, COLT '89.

[9]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[10]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[11]  N. Littlestone Mistake bounds and logarithmic linear-threshold learning algorithms , 1990 .

[12]  Carl H. Smith,et al.  Probability and Plurality for Aggregations of Learning Machines , 1987, Inf. Comput..

[13]  Leonard Pitt,et al.  Probabilistic inductive inference , 1989, JACM.

[14]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[15]  David Haussler,et al.  Calculation of the learning curve of Bayes optimal classification algorithm for learning a perceptron with noise , 1991, COLT '91.