On the geometry of output-code multi-class learning

We provide a new perspective on the popular multi-class algorithmic techniques one-vs-all and (error correcting) output-codes. We show that is that in cases where they are successful (at learning from labeled data), these techniques implicitly assume structure on how the classes are related. We show that by making that structure explicit, we can design algorithms to recover the classes based on limited labeled data. We provide results for commonly studied cases where the codewords of the classes are well separated: learning a linear one-vs-all classifier for data on the unit ball and learning a linear error correcting output code when the Hamming distance between the codewords is large (at least $d+1$ in a $d$-dimensional problem). We additionally consider the more challenging case where the codewords are not well separated, but satisfy a boundary features condition.

[1]  Maria-Florina Balcan,et al.  Exploiting Ontology Structures and Unlabeled Data for Learning , 2013, ICML.

[2]  Sebastian Thrun,et al.  Learning One More Thing , 1994, IJCAI.

[3]  Mikhail Belkin,et al.  Polynomial Learning of Distribution Families , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[4]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[5]  Santosh S. Vempala,et al.  The Spectral Method for General Mixture Models , 2008, SIAM J. Comput..

[6]  Alex M. Andrew,et al.  Boosting: Foundations and Algorithms , 2012 .

[7]  Adam Tauman Kalai,et al.  Efficiently learning mixtures of two Gaussians , 2010, STOC '10.

[8]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[9]  Ankur Moitra,et al.  Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[10]  Krishnakumar Balasubramanian,et al.  Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels , 2010, J. Mach. Learn. Res..

[11]  Sanjeev Arora,et al.  Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[12]  Yoav Freund,et al.  Boosting: Foundations and Algorithms , 2012 .

[13]  Ingo Steinwart Fully Adaptive Density-Based Clustering , 2014, 1409.8437.

[14]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[15]  John Langford,et al.  Sensitive Error Correcting Output Codes , 2005, COLT.

[16]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[17]  Sebastian Thrun,et al.  Explanation-based neural network learning a lifelong learning approach , 1995 .

[18]  Santosh S. Vempala,et al.  Isotropic PCA and Affine-Invariant Clustering , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[19]  J. Steinhardt Unsupervised Risk Estimation with only Structural Assumptions , 2016 .

[20]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[21]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[22]  Krishnakumar Balasubramanian,et al.  Unsupervised Supervised Learning II: Margin-Based Classification without Labels , 2011, AISTATS.

[23]  Sanjoy Dasgupta,et al.  Rates of convergence for the cluster tree , 2010, NIPS.

[24]  Dimitris Achlioptas,et al.  On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[25]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.