Direct and indirect algorithms for on-line learning of disjunctions

It is easy to design on-line learning algorithms for learning k out of n variable monotone disjunctions by simply keeping one weight per disjunction. Such algorithms use roughly O(nk) weights which can be prohibitively expensive. Surprisingly, algorithms like Winnow require only n weights (one per variable or attribute) and the mistake bound of these algorithms is not too much worse than the mistake bound of the more costly algorithms. The purpose of this paper is to investigate how exponentially many weights can be collapsed into only O(n) weights. In particular, we consider probabilistic assumptions that enable the Bayes optimal algorithm's posterior over the disjunctions to be encoded with only O(n) weights. This results in a new O(n) algorithm for learning disjunctions which is related to the Bylander's BEG algorithm originally introduced for linear regression. Besides providing a Bayesian interpretation for this new algorithm, we are also able to obtain mistake bounds for the noise free case resembling those that have been derived for the Winnow algorithm. The same techniques used to derive this new algorithm also provide a Bayesian interpretation for a normalized version of Winnow.