Evolutionary architecture search for deep multitask networks

Multitask learning, i.e. learning several tasks at once with the same neural network, can improve performance in each of the tasks. Designing deep neural network architectures for multitask learning is a challenge: There are many ways to tie the tasks together, and the design choices matter. The size and complexity of this problem exceeds human design ability, making it a compelling domain for evolutionary optimization. Using the existing state of the art soft ordering architecture as the starting point, methods for evolving the modules of this architecture and for evolving the overall topology or routing between modules are evaluated in this paper. A synergetic approach of evolving custom routings with evolved, shared modules for each task is found to be very powerful, significantly improving the state of the art in the Omniglot multitask, multialphabet character recognition domain. This result demonstrates how evolution can be instrumental in advancing deep neural network and complex system design in general.

[1]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[2]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Daan Wierstra,et al.  One-Shot Generalization in Deep Generative Models , 2016, ICML.

[4]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[5]  Elliot Meyerson,et al.  Beyond Shared Hierarchies: Deep Multitask Learning through Soft Layer Ordering , 2017, ICLR.

[6]  Jean-Baptiste Mouret,et al.  Does Aligning Phenotypic and Genotypic Modularity Improve the Evolution of Neural Networks? , 2016, GECCO.

[7]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Masanori Suganuma,et al.  A genetic programming approach to designing convolutional neural network architectures , 2017, GECCO.

[9]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[10]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[11]  Elliot Meyerson,et al.  Evolving Deep Neural Networks , 2017, Artificial Intelligence in the Age of Neural Networks and Brain Computing.

[12]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[13]  Wojciech Jaskowski,et al.  Multitask Visual Learning Using Genetic Programming , 2008, Evolutionary Computation.

[14]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[15]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[16]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[17]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[18]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[19]  Martial Hebert,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Risto Miikkulainen,et al.  Solving Multiple Isolated, Interleaved, and Blended Tasks through Modular Neuroevolution , 2016, Evolutionary Computation.

[21]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[22]  Dianhai Yu,et al.  Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[23]  Ji Wu,et al.  Rapid adaptation for deep neural networks through multi-task learning , 2015, INTERSPEECH.

[24]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[25]  Ambedkar Dukkipati,et al.  Attentive Recurrent Comparators , 2017, ICML.

[26]  Kristen Grauman,et al.  Learning with Whom to Share in Multi-task Feature Learning , 2011, ICML.

[27]  Sergey Levine,et al.  Learning modular neural network policies for multi-task and multi-robot transfer , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[29]  Quoc V. Le,et al.  Large-Scale Evolution of Image Classifiers , 2017, ICML.

[30]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[31]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[32]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Lukasz Kaiser,et al.  One Model To Learn Them All , 2017, ArXiv.

[35]  Yongxin Yang,et al.  Deep Multi-task Representation Learning: A Tensor Factorisation Approach , 2016, ICLR.

[36]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37]  Andrea Vedaldi,et al.  Learning multiple visual domains with residual adapters , 2017, NIPS.

[38]  Xiaodong Liu,et al.  Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval , 2015, NAACL.

[39]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[40]  Simon King,et al.  Deep neural networks employing Multi-Task Learning and stacked bottleneck features for speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41]  Malcolm I. Heywood,et al.  Multi-task learning in Atari video games with emergent tangled program graphs , 2017, GECCO.

[42]  Jasha Droppo,et al.  Multi-task learning in deep neural networks for improved phoneme recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[43]  Yuan Zhang,et al.  Stack-propagation: Improved Representation Learning for Syntax , 2016, ACL.

[44]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[45]  Yifan Gong,et al.  Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[46]  Shimon Whiteson,et al.  Multi-task evolutionary shaping without pre-specified representations , 2010, GECCO '10.

[47]  Andrea Vedaldi,et al.  Universal representations: The missing link between faces, text, planktons, and cat breeds , 2017, ArXiv.

[48]  Frank Hutter,et al.  CMA-ES for Hyperparameter Optimization of Deep Neural Networks , 2016, ArXiv.

[49]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[50]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[51]  Yu Cheng,et al.  Fully-Adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Yoshimasa Tsuruoka,et al.  A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.