Teaching with Commentaries

Effective training of deep neural networks can be challenging, and there remain many open questions on how to best learn these models. Recently developed methods to improve neural network training examine teaching: providing learned information during the training process to improve downstream model performance. In this paper, we take steps towards extending the scope of teaching. We propose a flexible teaching framework using commentaries, meta-learned information helpful for training on a particular task or dataset. We present an efficient and scalable gradient-based method to learn commentaries, leveraging recent work on implicit differentiation. We explore diverse applications of commentaries, from learning weights for individual training examples, to parameterizing label-dependent data augmentation policies, to representing attention masks that highlight salient image regions. In these settings, we find that commentaries can improve training speed and/or performance and also provide fundamental insights about the dataset and training process.

[1]  J. Stenton Learning how to teach. , 1973, Nursing mirror and midwives journal.

[2]  Ralph T. Putnam,et al.  Learning to teach. , 1996 .

[3]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[4]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[5]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[6]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[7]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[8]  Xiaojin Zhu,et al.  Machine Teaching: An Inverse Problem to Machine Learning and an Approach Toward Optimal Education , 2015, AAAI.

[9]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[10]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Le Song,et al.  Iterative Machine Teaching , 2017, ICML.

[12]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[13]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[14]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[15]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[16]  David Duvenaud,et al.  Stochastic Hyperparameter Optimization through Hypernetworks , 2018, ArXiv.

[17]  Lijun Wu,et al.  Learning to Teach with Dynamic Loss Functions , 2018, NeurIPS.

[18]  Jonathan Tompson,et al.  Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning , 2018, NeurIPS.

[19]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[20]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Marcus A. Badgeley,et al.  Deep learning predicts hip fracture using confounding patient and healthcare variables , 2018, npj Digital Medicine.

[22]  Qi Xie,et al.  Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting , 2019, NeurIPS.

[23]  R. Hofmann-Wellenhof,et al.  Association Between Surgical Skin Markings in Dermoscopic Images and Diagnostic Performance of a Deep Learning Convolutional Neural Network for Melanoma Recognition. , 2019, JAMA dermatology.

[24]  Roger B. Grosse,et al.  Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions , 2019, ICLR.

[25]  Andrew J. Davison,et al.  Self-Supervised Generalisation with Meta Auxiliary Learning , 2019, NeurIPS.

[26]  Yifan Yu,et al.  CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison , 2019, AAAI.

[27]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jon Kleinberg,et al.  Transfusion: Understanding Transfer Learning for Medical Imaging , 2019, NeurIPS.

[29]  Artem Molchanov,et al.  Generalized Inner Loop Meta-Learning , 2019, ArXiv.

[30]  Been Kim,et al.  Concept Bottleneck Models , 2020, ICML.

[31]  Lijun Wu,et al.  Learning to Teach with Deep Interactions , 2020, ArXiv.

[32]  Eunho Yang,et al.  Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks , 2019, ICLR.

[33]  David Duvenaud,et al.  Optimizing Millions of Hyperparameters by Implicit Differentiation , 2019, AISTATS.

[34]  Oriol Vinyals,et al.  Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML , 2019, ICLR.

[35]  Quoc V. Le,et al.  Meta Pseudo Labels , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Ethan Fetaya,et al.  Auxiliary Learning by Implicit Differentiation , 2020, ICLR.