Limited Gradient Descent: Learning With Noisy Labels

Label noise may affect the generalization of classifiers, and the effective learning of main patterns from samples with noisy labels is an important challenge. Recent studies have shown that deep neural networks tend to prioritize the learning of simple patterns over the memorization of noise patterns. This suggests a possible method to search for the best generalization that learns the main pattern until the noise begins to be memorized. Traditional approaches often employ a clean validation set to find the best stop timing of learning, i.e., early stopping. However, the generalization performance of such methods relies on the quality of validation sets. Further, in practice, a clean validation set is sometimes difficult to obtain. To solve this problem, we propose a method that can estimate the optimal stopping timing without a clean validation set, called limited gradient descent. We modified the labels of a few samples in a noisy dataset to obtain false labels and to create a reverse pattern. By monitoring the learning progress of the noisy and reverse samples, we can determine the stop timing of learning. In this paper, we also theoretically provide some necessary conditions on learning with noisy labels. Experimental results on CIFAR-10 and CIFAR-100 datasets demonstrate that our approach has a comparable generalization performance to methods relying on a clean validation set. Thus, on the noisy Clothing-1M dataset, our approach surpasses methods that rely on a clean validation set.

[1]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[2]  Ambuj Tewari,et al.  Mixture Proportion Estimation via Kernel Embeddings of Distributions , 2016, ICML.

[3]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[5]  Kevin Gimpel,et al.  Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise , 2018, NeurIPS.

[6]  Kiyoharu Aizawa,et al.  Joint Optimization Framework for Learning with Noisy Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[8]  Masashi Sugiyama,et al.  Learning Discrete Representations via Information Maximizing Self-Augmented Training , 2017, ICML.

[9]  Matthew S. Nokleby,et al.  Learning Deep Networks from Noisy Labels with Dropout Regularization , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[12]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[13]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[14]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[15]  Aritra Ghosh,et al.  Making risk minimization tolerant to label noise , 2014, Neurocomputing.

[16]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[17]  Xingrui Yu,et al.  Co-sampling: Training Robust Networks for Extremely Noisy Supervision , 2018, ArXiv.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  M. Shamim Hossain,et al.  Emotion-Aware Multimedia Systems Security , 2019, IEEE Transactions on Multimedia.

[20]  Shai Shalev-Shwartz,et al.  Decoupling "when to update" from "how to update" , 2017, NIPS.

[21]  M. Shamim Hossain,et al.  Heterogeneous Information Network-Based Content Caching in the Internet of Vehicles , 2019, IEEE Transactions on Vehicular Technology.

[22]  Gregory Cohen,et al.  EMNIST: an extension of MNIST to handwritten letters , 2017, CVPR 2017.

[23]  Li Fei-Fei,et al.  MentorNet: Regularizing Very Deep Neural Networks on Corrupted Labels , 2017, ArXiv.

[24]  Alex Lamb,et al.  Deep Learning for Classical Japanese Literature , 2018, ArXiv.

[25]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[26]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  J. Paul Brooks,et al.  Support Vector Machines with the Ramp Loss and the Hard Margin Loss , 2011, Oper. Res..

[29]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[30]  Min Chen,et al.  Label-less Learning for Traffic Control in an Edge Network , 2018, IEEE Network.

[31]  Mohsen Guizani,et al.  COCME: Content-Oriented Caching on the Mobile Edge for Wireless Communications , 2019, IEEE Wireless Communications.

[32]  Aritra Ghosh,et al.  Robust Loss Functions under Label Noise for Deep Neural Networks , 2017, AAAI.

[33]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[34]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[35]  Geoffrey E. Hinton,et al.  Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[36]  Hossein Mobahi,et al.  Large Margin Deep Networks for Classification , 2018, NeurIPS.

[37]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Yale Song,et al.  Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Trevor Darrell,et al.  Auxiliary Image Regularization for Deep CNNs with Noisy Labels , 2015, ICLR.

[40]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[41]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[42]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[43]  Dana Angluin,et al.  Learning from noisy examples , 1988, Machine Learning.

[44]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[45]  Aditya Krishna Menon,et al.  Learning with Symmetric Label Noise: The Importance of Being Unhinged , 2015, NIPS.

[46]  David A. Shamma,et al.  The New Data and New Challenges in Multimedia Research , 2015, ArXiv.

[47]  M. Shamim Hossain,et al.  Edge Intelligence in the Cognitive Internet of Things: Improving Sensitivity and Interactivity , 2019, IEEE Network.

[48]  Victor C. M. Leung,et al.  Cognitive Information Measurements: A New Perspective , 2019, Inf. Sci..

[49]  Abhinav Gupta,et al.  Learning from Noisy Large-Scale Datasets with Minimal Supervision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[51]  Mausam,et al.  To Re(label), or Not To Re(label) , 2014, HCOMP.