Evolving Loss Functions with Multivariate Taylor Polynomial Parameterizations

Loss function optimization for neural networks has recently emerged as a new direction for metalearning, with Genetic Loss Optimization (GLO) providing a general approach for the discovery and optimization of such functions. GLO represents loss functions as trees that are evolved and further optimized using evolutionary strategies. However, searching in this space is difficult because most candidates are not valid loss functions. In this paper, a new technique, Multivariate Taylor expansion-based genetic loss-function optimization (TaylorGLO), is introduced to solve this problem. It represents functions using a novel parameterization based on Taylor expansions, making the search more effective. TaylorGLO is able to find new loss functions that outperform those found by GLO in many fewer generations, demonstrating that loss function optimization is a productive avenue for metalearning.

[1]  A Tikhonov,et al.  Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[2]  J. Chisholm Rational approximants defined from double power series , 1973 .

[3]  P. Graves-Morris The numerical calculation of Padé approximants , 1979 .

[4]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[5]  Richard S. Forsyth,et al.  BEAGLE—A DARWINIAN APPROACH TO PATTERN RECOGNITION , 1981 .

[6]  D. E. Roberts,et al.  Calculation of Canterbury approximants , 1984 .

[7]  John J. Grefenstette,et al.  Genetic Search with Approximate Function Evaluation , 1985, ICGA.

[8]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[9]  Nikolaus Hansen,et al.  Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[10]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[11]  Nikolaus Hansen,et al.  Evaluating the CMA Evolution Strategy on Multimodal Test Functions , 2004, PPSN.

[12]  Jason D. Lohn,et al.  Evolution, Re-evolution, and Prototype of an X-Band Antenna for NASA's Space Technology 5 Mission , 2005, ICES.

[13]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[14]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[15]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[16]  R. Pisano,et al.  « La théorie analytique de la chaleur » : Notes on Fourier et Lamé , 2009 .

[17]  John R. Koza,et al.  Human-competitive results produced by genetic programming , 2010, Genetic Programming and Evolvable Machines.

[18]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[19]  Yaochu Jin,et al.  Surrogate-assisted evolutionary computation: Recent advances and future challenges , 2011, Swarm Evol. Comput..

[20]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[21]  Bogdan Gabrys,et al.  Metalearning: a survey of trends and technologies , 2013, Artificial Intelligence Review.

[22]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[23]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[24]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[28]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[29]  Yan Pan,et al.  Modelling Sentence Pairs with Tree-structured Attentive Encoder , 2016, COLING.

[30]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[31]  Yike Guo,et al.  Semantic Image Synthesis via Adversarial Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[33]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[34]  Yann Dauphin,et al.  Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.

[35]  Hao Li,et al.  Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.

[36]  Risto Miikkulainen,et al.  The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities , 2018, Artificial Life.

[37]  Stefano Soatto,et al.  Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence , 2019, NeurIPS.

[38]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[39]  Kristen Grauman,et al.  2.5D Visual Sound , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Risto Miikkulainen,et al.  Faster Training by Selecting Samples Using Embeddings , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[41]  Elliot Meyerson,et al.  Evolving Deep Neural Networks , 2017, Artificial Intelligence in the Age of Neural Networks and Brain Computing.

[42]  Risto Miikkulainen,et al.  Improved Training Speed, Accuracy, and Data Utilization Through Loss Function Optimization , 2019, 2020 IEEE Congress on Evolutionary Computation (CEC).

[43]  Quoc V. Le,et al.  AutoML-Zero: Evolving Machine Learning Algorithms From Scratch , 2020, ICML.

[44]  R. Miikkulainen,et al.  Evolutionary optimization of deep learning activation functions , 2020, GECCO.