Adversarial Feature Alignment: Avoid Catastrophic Forgetting in Incremental Task Lifelong Learning

Humans are able to master a variety of knowledge and skills with ongoing learning. By contrast, dramatic performance degradation is observed when new tasks are added to an existing neural network model. This phenomenon, termed catastrophic forgetting, is one of the major roadblocks that prevent deep neural networks from achieving human-level artificial intelligence. Several research efforts (e.g., lifelong or continual learning algorithms) have proposed to tackle this problem. However, they either suffer from an accumulating drop in performance as the task sequence grows longer, or require storing an excessive number of model parameters for historical memory, or cannot obtain competitive performance on the new tasks. In this letter, we focus on the incremental multitask image classification scenario. Inspired by the learning process of students, who usually decompose complex tasks into easier goals, we propose an adversarial feature alignment method to avoid catastrophic forgetting. In our design, both the low-level visual features and high-level semantic features serve as soft targets and guide the training process in multiple stages, which provide sufficient supervised information of the old tasks and help to reduce forgetting. Due to the knowledge distillation and regularization phenomena, the proposed method gains even better performance than fine-tuning on the new tasks, which makes it stand out from other methods. Extensive experiments in several typical lifelong learning scenarios demonstrate that our method outperforms the state-of-the-art methods in both accuracy on new tasks and performance preservation on old tasks.

[1]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[2]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[3]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[4]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[5]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[6]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[7]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[8]  Davide Maltoni,et al.  Continuous Learning in Single-Incremental-Task Scenarios , 2018, Neural Networks.

[9]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[10]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[11]  Ronald Kemker,et al.  Measuring Catastrophic Forgetting in Neural Networks , 2017, AAAI.

[12]  Davide Maltoni,et al.  CORe50: a New Dataset and Benchmark for Continuous Object Recognition , 2017, CoRL.

[13]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[14]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[15]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[16]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Yi Yang,et al.  Deep Adversarial Attention Alignment for Unsupervised Domain Adaptation: the Benefit of Target Expectation Maximization , 2018, ECCV.

[18]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[19]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[20]  Dahua Lin,et al.  Lifelong Learning via Progressive Distillation and Retrospection , 2018, ECCV.

[21]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[22]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[23]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[26]  Andreas S. Tolias,et al.  Generative replay with feedback connections as a general strategy for continual learning , 2018, ArXiv.

[27]  Tinne Tuytelaars,et al.  Expert Gate: Lifelong Learning with a Network of Experts , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[29]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[30]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[31]  Matthew B. Blaschko,et al.  Encoder Based Lifelong Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Michael I. Jordan,et al.  Deep Transfer Learning with Joint Adaptation Networks , 2016, ICML.

[33]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[34]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[36]  John K. Tsotsos,et al.  Incremental Learning Through Deep Adaptation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[38]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.