论文信息 - BlockQNN: Efficient Block-Wise Neural Network Architecture Generation

BlockQNN: Efficient Block-Wise Neural Network Architecture Generation

Convolutional neural networks have gained a remarkable success in computer vision. However, most usable network architectures are hand-crafted and usually require expertise and elaborate design. In this paper, we provide a block-wise network generation pipeline called BlockQNN which automatically builds high-performance networks using the Q-Learning paradigm with epsilon-greedy exploration strategy. The optimal network block is constructed by the learning agent which is trained to choose component layers sequentially. We stack the block to construct the whole auto-generated network. To accelerate the generation process, we also propose a distributed asynchronous framework and an early stop strategy. The block-wise generation brings unique advantages: (1) it yields state-of-the-art results in comparison to the hand-crafted networks on image classification, particularly, the best network generated by BlockQNN achieves 2.35% top-1 error rate on CIFAR-10. (2) it offers tremendous reduction of the search space in designing networks, spending only 3 days with 32 GPUs. A faster version can yield a comparable result with only 1 GPU in 20 hours. (3) it has strong generalizability in that the network built on CIFAR also performs well on the larger-scale dataset. The best network achieves very competitive accuracy of 82.0% top-1 and 96.0% top-5 on ImageNet.

[1] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[2] Jakob Verbeek,et al. Convolutional Neural Fabrics , 2016, NIPS.

[3] Ricardo Vilalta,et al. A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[4] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5] Graham W. Taylor,et al. Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[6] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Frank Hutter,et al. Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves , 2015, IJCAI.

[8] Ameet Talwalkar,et al. Random Search and Reproducibility for Neural Architecture Search , 2019, UAI.

[9] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.

[10] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.

[12] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[13] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[14] Bohyung Han,et al. Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[16] Quoc V. Le,et al. Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[17] Ramesh Raskar,et al. Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[18] Frank Hutter,et al. Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution , 2018, ICLR.

[19] Song Han,et al. Path-Level Network Transformation for Efficient Architecture Search , 2018, ICML.

[20] Masanori Suganuma,et al. A genetic programming approach to designing convolutional neural network architectures , 2017, GECCO.

[21] Li Fei-Fei,et al. Progressive Neural Architecture Search , 2017, ECCV.

[22] Quoc V. Le,et al. Faster Discovery of Neural Architectures by Searching for Paths in a Large Model , 2018, ICLR.

[23] Theodore Lim,et al. SMASH: One-Shot Model Architecture Search through HyperNetworks , 2017, ICLR.

[24] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[27] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[28] Yong Yu,et al. Efficient Architecture Search by Network Transformation , 2017, AAAI.

[29] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[30] Wei Wu,et al. Practical Block-Wise Neural Network Architecture Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31] Dahua Lin,et al. PolyNet: A Pursuit of Structural Diversity in Very Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.

[33] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[35] Alok Aggarwal,et al. Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[36] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[37] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Quoc V. Le,et al. Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[40] Alan L. Yuille,et al. Genetic CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43] Quoc V. Le,et al. Large-Scale Evolution of Image Classifiers , 2017, ICML.

[44] Oriol Vinyals,et al. Hierarchical Representations for Efficient Architecture Search , 2017, ICLR.

[45] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[46] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[47] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[48] Ramesh Raskar,et al. Accelerating Neural Architecture Search using Performance Prediction , 2017, ICLR.

[49] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Song Han,et al. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[51] Matthew J. Hausknecht,et al. Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Ramesh Raskar,et al. Practical Neural Network Performance Prediction for Early Stopping , 2017, ArXiv.

[53] Risto Miikkulainen,et al. Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[54] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[55] J. D. Schaffer,et al. Combinations of genetic algorithms and neural networks: a survey of the state of the art , 1992, [Proceedings] COGANN-92: International Workshop on Combinations of Genetic Algorithms and Neural Networks.

[56] Yoshua Bengio,et al. Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[57] Kenneth O. Stanley,et al. A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks , 2009, Artificial Life.

[58] Liang Lin,et al. SNAS: Stochastic Neural Architecture Search , 2018, ICLR.

[59] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[60] Luca Bertinetto,et al. Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[61] Frank Hutter,et al. Simple And Efficient Architecture Search for Convolutional Neural Networks , 2017, ICLR.

[62] Frank Hutter,et al. Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[63] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[64] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[65] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[66] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[67] Aaron Klein,et al. Learning Curve Prediction with Bayesian Neural Networks , 2016, ICLR.

[68] Jian Sun,et al. Convolutional neural networks at constrained time cost , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69] Qiang Chen,et al. Network In Network , 2013, ICLR.

[70] Shuicheng Yan,et al. Dual Path Networks , 2017, NIPS.

[71] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[72] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[73] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74] Aaron Q. Li,et al. Parameter Server for Distributed Machine Learning , 2013 .