Strength from Weakness: Fast Learning Using Weak Supervision

We study generalization properties of weakly supervised learning. That is, learning where only a few "strong" labels (the actual target of our prediction) are present but many more "weak" labels are available. In particular, we show that having access to weak labels can significantly accelerate the learning rate for the strong task to the fast rate of $\mathcal{O}(\nicefrac1n)$, where $n$ denotes the number of strongly labeled data points. This acceleration can happen even if by itself the strongly labeled data admits only the slower $\mathcal{O}(\nicefrac{1}{\sqrt{n}})$ rate. The actual acceleration depends continuously on the number of weak labels available, and on the relation between the two tasks. Our theoretical results are reflected empirically across a range of tasks and illustrate how weak labels speed up learning on the strong task.

[1]  Anima Anandkumar,et al.  Learning From Noisy Singly-labeled Data , 2017, ICLR.

[2]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  N. Schork Personalized medicine: Time for one-person trials , 2015, Nature.

[4]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Jean-Yves Audibert Fast learning rates in statistical inference through aggregation , 2007, math/0703854.

[7]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[8]  P. Bartlett,et al.  Empirical minimization , 2006 .

[9]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[10]  A. Juditsky,et al.  Learning by mirror averaging , 2005, math/0511468.

[11]  P. Rigollet,et al.  Optimal learning with Q-aggregation , 2013, 1301.6080.

[12]  Zhi-Hua Zhou,et al.  A brief introduction to weakly supervised learning , 2018 .

[13]  Qixiang Ye,et al.  Min-Entropy Latent Model for Weakly Supervised Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Gang Niu,et al.  Learning from Complementary Labels , 2017, NIPS.

[15]  Vasilis Syrgkanis,et al.  Orthogonal Statistical Learning , 2019, The Annals of Statistics.

[16]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[17]  Xi Chen,et al.  Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[18]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[19]  George Karypis,et al.  A Comprehensive Survey of Neighborhood-based Recommendation Methods , 2011, Recommender Systems Handbook.

[20]  Alexei A. Efros,et al.  What makes ImageNet good for transfer learning? , 2016, ArXiv.

[21]  A. Tsybakov,et al.  Mirror averaging with sparsity priors , 2010, 1003.1189.

[22]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[23]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[24]  Wei Li,et al.  WebVision Database: Visual Learning and Understanding from Web Data , 2017, ArXiv.

[25]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[26]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[27]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[28]  B. Carl,et al.  Entropy, Compactness and the Approximation of Operators , 1990 .

[29]  Lawrence Carin,et al.  Logistic regression with an auxiliary data source , 2005, ICML.

[30]  Nishant Mehta,et al.  Fast rates with high probability in exp-concave statistical learning , 2016, AISTATS.

[31]  Nasser M. Nasrabadi,et al.  A Weakly Supervised Fine Label Classifier Enhanced by Coarse Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Matthieu Cord,et al.  WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Mark D. Reid,et al.  Mixability in Statistical Learning , 2012, NIPS.

[34]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Haipeng Luo,et al.  Logistic Regression: The Importance of Being Improper , 2018, COLT.

[37]  Mikhail Khodak,et al.  A Theoretical Analysis of Contrastive Unsupervised Representation Learning , 2019, ICML.

[38]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[39]  Mark D. Reid,et al.  Fast rates in statistical and online learning , 2015, J. Mach. Learn. Res..

[40]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Robinson Piramuthu,et al.  HD-CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[42]  Lior Rokach,et al.  Introduction to Recommender Systems Handbook , 2011, Recommender Systems Handbook.

[43]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[45]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[46]  Jerzy Neyman,et al.  ASYMPTOTICALLY OPTIMAL TESTS OF COMPOSITE HYPOTHESES FOR RANDOMIZED EXPERIMENTS WITH NONCONTROLLED PREDICTOR VARIABLES , 1965 .

[47]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[49]  Euan A. Ashley,et al.  Weakly supervised classification of aortic valve malformations using unlabeled cardiac MRI sequences , 2019, Nature Communications.

[50]  Kun Zhang,et al.  Generative-Discriminative Complementary Learning , 2019, AAAI.

[51]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[52]  Ambuj Tewari,et al.  On the Generalization Ability of Online Strongly Convex Programming Algorithms , 2008, NIPS.

[53]  Christopher De Sa,et al.  Data Programming: Creating Large Training Sets, Quickly , 2016, NIPS.

[54]  Pietro Perona,et al.  Lean Crowdsourcing: Combining Humans and Machines in an Online System , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Eric P. Xing,et al.  Large-Scale Category Structure Aware Image Categorization , 2011, NIPS.

[56]  Hongfang Liu,et al.  A clinical text classification paradigm using weak supervision and deep representation , 2019, BMC Medical Informatics and Decision Making.

[57]  Yu Liu,et al.  CNN-RNN: a large-scale hierarchical image classification framework , 2018, Multimedia Tools and Applications.

[58]  Gang Niu,et al.  Classification from Pairwise Similarity and Unlabeled Data , 2018, ICML.

[59]  Max Welling,et al.  Herding dynamical weights to learn , 2009, ICML '09.

[60]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[61]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[62]  Frederic Sala,et al.  Training Complex Models with Multi-Task Weak Supervision , 2018, AAAI.

[63]  Kayhan Batmanghelich,et al.  Weakly Supervised Disentanglement by Pairwise Similarities , 2019, AAAI.

[64]  Ted Briscoe,et al.  Weakly Supervised Learning for Hedge Classification in Scientific Literature , 2007, ACL.

[65]  Yingli Tian,et al.  Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Andrea Vedaldi,et al.  Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Ivan Laptev,et al.  Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Neural Networks , 2013 .

[69]  Laurens van der Maaten,et al.  Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Pranjal Awasthi,et al.  Crowdsourcing with Arbitrary Adversaries , 2018, ICML.

[71]  Yinghai Lu,et al.  Deep Learning Recommendation Model for Personalization and Recommendation Systems , 2019, ArXiv.