An algorithm is a weak learner if with some small probability itoutputs a hypothesis with error slightly below 50%. This paper presentssufficient conditions for weak learning.
Our main result requires a “consistency oracle” for theconcept class <inline-equation><f><ge>F</ge></f></inline-equation> which decides for a given set of examples whetherthere is a concept in <inline-equation><f><ge>F</ge></f></inline-equation> consistent with the examples. We show that such anoracle can be used to construct a computationally efficient weaklearning algorithm for <inline-equation><f><rm><ge>F</ge></rm></f></inline-equation> if<inline-equation><f><ge>F</ge></f><?Pub Caret></inline-equation> is learnable at all. We considerconsistency oracles which are allowed to give wrong answers anddiscusses how the number of incorrect answers effects the oracle's usein computationally efficient weak learning algorihms.
We also define “weak Occam algorithms” which, when given a set of <?Pub Fmt italic>m<?Pub Fmt /italic> examples, select aconsistent hypothesis from some class of2<?Pub Fmt italic><supscrpt>m-(1/p(m))</supscrpt><?Pub Fmt /italic>possible hypotheses. We show that these weak Occam algorithms are alsoweak learners. In contrast, we show that an Occam style algorithm whichselects a consistent hypothesis from a class of2<?Pub Fmt italic><supscrpt>m+1</supscrpt><?Pub Fmt /italic>-2hypotheses is not a weak learner.
[1]
Vladimir Vapnik,et al.
Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities
,
1971
.
[2]
Norbert Sauer,et al.
On the Density of Families of Sets
,
1972,
J. Comb. Theory, Ser. A.
[3]
Leslie G. Valiant,et al.
A theory of the learnable
,
1984,
STOC '84.
[4]
David Haussler,et al.
Occam's Razor
,
1987,
Inf. Process. Lett..
[5]
Noga Alon,et al.
Partitioning and geometric embedding of range spaces of finite Vapnik-Chervonenkis dimension
,
1987,
SCG '87.
[6]
David Haussler,et al.
Equivalence of models for polynomial learnability
,
1988,
COLT '88.
[7]
M. Kearns,et al.
Crytographic limitations on learning Boolean formulae and finite automata
,
1989,
STOC '89.
[8]
David Haussler,et al.
Learnability and the Vapnik-Chervonenkis dimension
,
1989,
JACM.
[9]
Vladimir Vovk,et al.
Aggregating strategies
,
1990,
COLT '90.
[10]
Yoav Freund,et al.
Boosting a weak learning algorithm by majority
,
1990,
COLT '90.
[11]
Michael Kearns,et al.
Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension
,
1992,
[Proceedings 1992] IJCNN International Joint Conference on Neural Networks.
[12]
Leslie G. Valiant,et al.
Cryptographic limitations on learning Boolean formulae and finite automata
,
1994,
JACM.