Agnostically Learning Halfspaces

We give the first algorithm that (under distributional assumptions) efficiently learns halfspaces in the notoriously difficult agnostic framework of Kearns, Schapire, & Sellie, where a learner is given access to labeled examples drawn from a distribution, without restriction on the labels (e.g. adversarial noise). The algorithm constructs a hypothesis whose error rate on future examples is within an additive /spl epsi/ of the optimal halfspace, in time poly(n) for any constant /spl epsi/ > 0, under the uniform distribution over {-1, 1}/sup n/ or the unit sphere in /spl Ropf//sup n/ , as well as under any log-concave distribution over /spl Ropf/ /sup n/. It also agnostically learns Boolean disjunctions in time 2/sup O~(/spl radic/n)/ with respect to any distribution. The new algorithm, essentially L/sub 1/ polynomial regression, is a noise-tolerant arbitrary distribution generalization of the "low degree" Fourier algorithm of Linial, Mansour, & Nisan. We also give a new algorithm for PAC learning halfspaces under the uniform distribution on the unit sphere with the current best bounds on tolerable rate of "malicious noise".

[1]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[2]  Leslie G. Valiant,et al.  Learning Disjunction of Conjunctions , 1985, IJCAI.

[3]  D. Clark,et al.  Estimates of the Hermite and the Freud polynomials , 1990 .

[4]  Eric B. Baum,et al.  The Perceptron Algorithm is Fast for Nonmalicious Distributions , 1990, Neural Computation.

[5]  Yuh-Dauh Lyuu,et al.  The Transition to Perfect Generalization in Perceptrons , 1991, Neural Computation.

[6]  Noam Nisan,et al.  On the degree of boolean functions as real polynomials , 1992, STOC '92.

[7]  R. Schapire,et al.  On the Sample Complexity of Weakly Learning , 1992 .

[8]  Ramamohan Paturi,et al.  On the degree of polynomials that approximate symmetric Boolean functions (preliminary version) , 1992, STOC '92.

[9]  Linda Sellie,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[10]  Pavel Pudlák,et al.  Threshold circuits of bounded depth , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[11]  Ming Li,et al.  Learning in the Presence of Malicious Errors , 1993, SIAM J. Comput..

[12]  Noam Nisan,et al.  Constant depth circuits, Fourier transform, and learnability , 1993, JACM.

[13]  Scott E. Decatur Statistical queries and faulty PAC oracles , 1993, COLT '93.

[14]  Yishay Mansour,et al.  Weakly learning DNF and characterizing statistical query learning using Fourier analysis , 1994, STOC '94.

[15]  Philip M. Long On the sample complexity of PAC learning half-spaces against the uniform distribution , 1995, IEEE Trans. Neural Networks.

[16]  Peter L. Bartlett,et al.  On efficient agnostic learning of linear combinations of basis functions , 1995, COLT '95.

[17]  Robert E. Schapire,et al.  On the Sample Complexity of Weakly Learning , 1995, Inf. Comput..

[18]  Peter L. Bartlett,et al.  Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.

[19]  Nader H. Bshouty,et al.  On the Fourier spectrum of monotone functions , 1996, JACM.

[20]  J. C. Jackson The harmonic sieve: a novel application of Fourier analysis to machine learning theory and practice , 1996 .

[21]  Jeffrey C. Jackson An Efficient Membership-Query Algorithm for Learning DNF with Respect to the Uniform Distribution , 1997, J. Comput. Syst. Sci..

[22]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[23]  M. Kearns Efficient noise-tolerant learning from statistical queries , 1998, JACM.

[24]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[25]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[26]  Yishay Mansour,et al.  Learning Conjunctions with Noise under Product Distributions , 1998, Inf. Process. Lett..

[27]  Rocco A. Servedio,et al.  Boosting and hard-core sets , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[28]  Rocco A. Servedio,et al.  On PAC learning using Winnow, Perceptron, and a Perceptron-like algorithm , 1999, COLT '99.

[29]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[30]  V. Zinoviev,et al.  Codes on euclidean spheres , 2001 .

[31]  Rocco A. Servedio,et al.  Learnability beyond AC0 , 2002, STOC '02.

[32]  Philip M. Long An upper bound on the sample complexity of PAC-learning halfspaces with respect to the uniform distribution , 2003, Inf. Process. Lett..

[33]  Ryan O'Donnell,et al.  New degree bounds for polynomial threshold functions , 2003, STOC '03.

[34]  Santosh S. Vempala,et al.  Logconcave functions: geometry and efficient sampling algorithms , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[35]  Avrim Blum,et al.  Machine learning: my favorite results, directions, and open problems , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[36]  Adam Tauman Kalai,et al.  Noise-tolerant learning, the parity problem, and the statistical query model , 2000, STOC '00.

[37]  Adam R. Klivans,et al.  Learning intersections and thresholds of halfspaces , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[38]  R. Schapire,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[39]  K. Clarkson Subgradient and sampling algorithms for l1 regression , 2005, SODA '05.