Exponentiated Gradient Reweighting for Robust Training Under Label Noise and Beyond

Many learning tasks in machine learning can be viewed as taking a gradient step towards minimizing the average loss of a batch of examples in each training iteration. When noise is prevalent in the data, this uniform treatment of examples can lead to overfitting to noisy examples with larger loss values and result in poor generalization. Inspired by the expert setting in on-line learning, we present a flexible approach to learning from noisy examples. Specifically, we treat each training example as an expert and maintain a distribution over all examples. We alternate between updating the parameters of the model using gradient descent and updating the example weights using the exponentiated gradient update. Unlike other related methods, our approach handles a general class of loss functions and can be applied to a wide range of noise types and applications. We show the efficacy of our approach for multiple learning settings, namely noisy principal component analysis and a variety of noisy classification problems.

[1]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[2]  Manfred K. Warmuth,et al.  Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension , 2008 .

[3]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[4]  Xuelong Li,et al.  L1-Norm-Based 2DPCA , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  Li Fei-Fei,et al.  MentorNet: Regularizing Very Deep Neural Networks on Corrupted Labels , 2017, ArXiv.

[6]  Manfred K. Warmuth,et al.  Tracking a Small Set of Experts by Mixing Past Posteriors , 2003, J. Mach. Learn. Res..

[7]  Vicki Bruce,et al.  Face Recognition: From Theory to Applications , 1999 .

[8]  Mark Craven,et al.  Active Learning with Real Annotation Costs , 2008 .

[9]  Feiping Nie,et al.  Optimal Mean Robust Principal Component Analysis , 2014, ICML.

[10]  Nir Shavit,et al.  Deep Learning is Robust to Massive Label Noise , 2017, ArXiv.

[11]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[12]  Yang Liu,et al.  Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates , 2019, ICML.

[13]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  J. Stenton Learning how to teach. , 1973, Nursing mirror and midwives journal.

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine Learning.

[17]  Manfred K. Warmuth,et al.  Two-temperature logistic regression based on the Tsallis divergence , 2017, AISTATS.

[18]  Elyor Kodirov,et al.  Derivative Manipulation for General Example Weighting. , 2019 .

[19]  Aritra Ghosh,et al.  Robust Loss Functions under Label Noise for Deep Neural Networks , 2017, AAAI.

[20]  Arash Vahdat,et al.  Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks , 2017, NIPS.

[21]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[22]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[23]  Xuelong Li,et al.  Auto-weighted two-dimensional principal component analysis with robust outliers , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Matthew S. Nokleby,et al.  Learning Deep Networks from Noisy Labels with Dropout Regularization , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[25]  Manfred K. Warmuth,et al.  Robust Bi-Tempered Logistic Loss Based on Bregman Divergences , 2019, NeurIPS.

[26]  Junnan Li,et al.  DivideMix: Learning with Noisy Labels as Semi-supervised Learning , 2020, ICLR.

[27]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[28]  Weilin Huang,et al.  CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images , 2018, ECCV.

[29]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Peyman Milanfar,et al.  NIMA: Neural Image Assessment , 2017, IEEE Transactions on Image Processing.

[31]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[32]  Rainer Hofmann-Wellenhof,et al.  A deep learning system for differential diagnosis of skin diseases , 2019, Nature Medicine.

[33]  G. Raj,et al.  Blur image detection using Laplacian operator and Open-CV , 2016, 2016 International Conference System Modeling & Advancement in Research Trends (SMART).

[34]  James Bailey,et al.  Symmetric Cross Entropy for Robust Learning With Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Rong Wang,et al.  Robust 2DPCA With Non-greedy $\ell _{1}$ -Norm Maximization for Image Analysis , 2015, IEEE Transactions on Cybernetics.

[36]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Toby P. Breckon,et al.  On the Impact of Lossy Image and Video Compression on the Performance of Deep Convolutional Neural Network Architectures , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[38]  Görkem Algan,et al.  Label Noise Types and Their Effects on Deep Learning , 2020, ArXiv.

[39]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[40]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[41]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[42]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[43]  Sheng Liu,et al.  Early-Learning Regularization Prevents Memorization of Noisy Labels , 2020, NeurIPS.

[44]  Subramanian Ramanathan,et al.  Learning from multiple annotators with varying expertise , 2013, Machine Learning.

[45]  Peyman Milanfar,et al.  Learned perceptual image enhancement , 2017, 2018 IEEE International Conference on Computational Photography (ICCP).

[46]  Hanghang Tong,et al.  Robust Principal Component Analysis with Adaptive Neighbors , 2019, NeurIPS.

[47]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[48]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[49]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  James Bailey,et al.  Normalized Loss Functions for Deep Learning with Noisy Labels , 2020, ICML.

[51]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[53]  Xingrui Yu,et al.  How does Disagreement Help Generalization against Label Corruption? , 2019, ICML.

[54]  Krishna Mohan Buddhiraju,et al.  Noise Filtering in High-Resolution Satellite Images Using Composite Multiresolution Transforms , 2018, PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science.

[55]  Dennis DeCoste,et al.  Data Parameters: A New Family of Parameters for Learning a Differentiable Curriculum , 2019, NeurIPS.

[56]  Chen Gong,et al.  Robust early-learning: Hindering the memorization of noisy labels , 2021, ICLR.

[57]  Peyman Milanfar,et al.  Deep Perceptual Image Quality Assessment for Compression , 2021, ICIP 2021.