Regularized evolutionary population-based training

Metalearning of deep neural network (DNN) architectures and hyperparameters has become an increasingly important area of research. At the same time, network regularization has been recognized as a crucial dimension to effective training of DNNs. However, the role of metalearning in establishing effective regularization has not yet been fully explored. There is recent evidence that loss-function optimization could play this role, however it is computationally impractical as an outer loop to full training. This paper presents an algorithm called Evolutionary Population-Based Training (EPBT) that interleaves the training of a DNN's weights with the metalearning of loss functions. They are parameterized using multivariate Taylor expansions that EPBT can directly optimize. Such simultaneous adaptation of weights and loss functions can be deceptive, and therefore EPBT uses a quality-diversity heuristic called Novelty Pulsation as well as knowledge distillation to prevent overfitting during training. On the CIFAR-10 and SVHN image classification benchmarks, EPBT results in faster, more accurate learning. The discovered hyperparameters adapt to the training process and serve to regularize the learning task by discouraging overfitting to the labels. EPBT thus demonstrates a practical instantiation of regularization metalearning based on simultaneous training.

[1]  Andries Petrus Engelbrecht,et al.  Visualising Basins of Attraction for the Cross-Entropy and the Squared Error Neural Network Loss Functions , 2019, Neurocomputing.

[2]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[3]  Risto Miikkulainen,et al.  Enhanced optimization with composite objectives and novelty pulsation , 2019, GECCO.

[4]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[5]  Wojciech Czarnecki,et al.  On Loss Functions for Deep Neural Networks in Classification , 2017, ArXiv.

[6]  Darrell Whitley,et al.  A genetic algorithm tutorial , 1994, Statistics and Computing.

[7]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[8]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[9]  Geoffrey E. Hinton,et al.  When Does Label Smoothing Help? , 2019, NeurIPS.

[10]  Nikolaus Hansen,et al.  Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[11]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[12]  Ion Stoica,et al.  Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules , 2019, ICML.

[13]  Daniel Cremers,et al.  Regularization for Deep Learning: A Taxonomy , 2017, ArXiv.

[14]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[15]  Kenneth O. Stanley,et al.  Exploiting Open-Endedness to Solve Problems Through the Search for Novelty , 2008, ALIFE.

[16]  Swami Sankaranarayanan,et al.  MetaReg: Towards Domain Generalization using Meta-Regularization , 2018, NeurIPS.

[17]  Michael I. Jordan,et al.  How Does Learning Rate Decay Help Modern Neural Networks , 2019 .

[18]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[19]  R. Miikkulainen,et al.  Evolving Loss Functions with Multivariate Taylor Polynomial Parameterizations , 2020, ArXiv.

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Stefano Soatto,et al.  Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence , 2019, NeurIPS.

[22]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[24]  Deepti Mehrotra,et al.  Comparative review of selection techniques in genetic algorithm , 2015, 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE).

[25]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[26]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[27]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Wolfgang Banzhaf,et al.  Genetic Programming: An Introduction , 1997 .

[29]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[30]  Risto Miikkulainen,et al.  Enhanced Optimization with Composite Objectives and Novelty Pulsation , 2018, GPTP.

[31]  Aaron Klein,et al.  Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[32]  Risto Miikkulainen,et al.  Improved Training Speed, Accuracy, and Data Utilization Through Loss Function Optimization , 2019, 2020 IEEE Congress on Evolutionary Computation (CEC).

[33]  Frank Hutter,et al.  CMA-ES for Hyperparameter Optimization of Deep Neural Networks , 2016, ArXiv.

[34]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[35]  Sangheum Hwang,et al.  Self-Knowledge Distillation: A Simple Way for Better Generalization , 2020, ArXiv.

[36]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[37]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[40]  Jean-Baptiste Mouret,et al.  Illuminating search spaces by mapping elites , 2015, ArXiv.

[41]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[42]  Michael W. Mahoney,et al.  Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning , 2018, J. Mach. Learn. Res..

[43]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Kalyanmoy Deb,et al.  NSGA-NET: A Multi-Objective Genetic Algorithm for Neural Architecture Search , 2018, ArXiv.

[45]  Risto Miikkulainen,et al.  Designing neural networks through neuroevolution , 2019, Nat. Mach. Intell..

[46]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[47]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[48]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[49]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[50]  Elliot Meyerson,et al.  Evolving Deep Neural Networks , 2017, Artificial Intelligence in the Age of Neural Networks and Brain Computing.

[51]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[52]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[53]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[54]  Pieter Abbeel,et al.  Evolved Policy Gradients , 2018, NeurIPS.

[55]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[56]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .