Efficient Learning with Virtual Threshold Gates

We reduce learning simple geometric concept classes to learning disjunctions over exponentially many variables. We then apply an on-line algorithm called Winnow whose number of prediction mistakes grows only logarithmically with the number of variables. The hypotheses of Winnow are linear threshold functions with one weight per variable. We find ways to keep the exponentially many weights of Winnow implicitly so that the time for the algorithm to compute a prediction and update its ``virtual'''' weights is polynomial. Our method can be used to learn d-dimensional axis-parallel boxes when d is variable, and unions of d-dimensional axis-parallel boxes when d is constant. The worst-case number of mistakes of our algorithms for the above classes is optimal to within a constant factor, and our algorithms inherit the noise robustness of Winnow. We think that other on-line algorithms with multiplicative weight updates whose loss bounds grow logarithmically with the dimension are amenable to our methods.

[1]  Zhixiang Chen,et al.  On learning discretized geometric concepts , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[2]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[3]  Robert E. Schapire,et al.  Predicting Nearly as Well as the Best Pruning of a Decision Tree , 1995, COLT.

[4]  Zhixiang Chen,et al.  The Bounded Injury Priority Method and the Learnability of Unions of Rectangles , 1996, Ann. Pure Appl. Log..

[5]  Manfred K. Warmuth,et al.  Learning Nested Differences of Intersection-Closed Concept Classes , 1989, COLT '89.

[6]  Manfred K. Warmuth,et al.  The perceptron algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant , 1995, COLT '95.

[7]  David Haussler,et al.  Tight worst-case loss bounds for predicting with expert advice , 1994, EuroCOLT.

[8]  Neil D. Pearson,et al.  Consumption and Portfolio Policies With Incomplete Markets and Short‐Sale Constraints: the Finite‐Dimensional Case , 1991 .

[9]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[10]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[11]  Nick Littlestone,et al.  Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow , 1991, COLT '91.

[12]  N. Littlestone Learning Abound: Quickly When Irrelevant Attributes A New Linear-threshold Algorithm , 1988 .

[13]  N. Littlestone Mistake bounds and logarithmic linear-threshold learning algorithms , 1990 .

[14]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[15]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[16]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[17]  Peter Auer,et al.  On-line learning of rectangles in noisy environments , 1993, COLT '93.

[18]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[19]  Michael Frazier,et al.  Learning from a consistently ignorant teacher , 1994, COLT '94.

[20]  Nick Littlestone,et al.  From on-line to batch learning , 1989, COLT '89.

[21]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[22]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[23]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[24]  D. Angluin Queries and Concept Learning , 1988 .