Dynamic Slimmable Network

Current dynamic networks and dynamic pruning methods have shown their promising capability in reducing theoretical computation complexity. However, dynamic sparse patterns on convolutional filters fail to achieve actual acceleration in real-world implementation, due to the extra burden of indexing, weight-copying, or zero-masking. Here, we explore a dynamic network slimming regime, named Dynamic Slimmable Network (DS-Net), which aims to achieve good hardware-efficiency via dynamically adjusting filter numbers of networks at test time with respect to different inputs, while keeping filters stored statically and contiguously in hardware to prevent the extra burden. Our DS-Net is empowered with the ability of dynamic inference by the proposed double-headed dynamic gate that comprises an attention head and a slimming head to predictively adjust network width with negligible extra computation cost. To ensure generality of each candidate architecture and the fairness of gate, we propose a disentangled two-stage training scheme inspired by one-shot NAS. In the first stage, a novel training technique for weight-sharing networks named In-place Ensemble Bootstrapping is proposed to improve the supernet training efficacy. In the second stage, Sandwich Gate Sparsification is proposed to assist the gate training by identifying easy and hard samples in an online way. Extensive experiments demonstrate our DS-Net consistently outperforms its static counterparts as well as state-of-the-art static and dynamic model compression methods by a large margin (up to 5.9%). Typically, DS-Net achieves 2-4× computation reduction and 1.62× real-world acceleration over ResNet-50 and MobileNet with minimal accuracy drops on ImageNet. 1

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[3]  Gregory Shakhnarovich,et al.  FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[4]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Chuang Gan,et al.  Once for All: Train One Network and Specialize it for Efficient Deployment , 2019, ICLR.

[6]  Zhiqiang Shen,et al.  MEAL: Multi-Model Ensemble via Adversarial Learning , 2018, AAAI.

[7]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[8]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[9]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Yuandong Tian,et al.  FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Xiangyu Zhang,et al.  Single Path One-Shot Neural Architecture Search with Uniform Sampling , 2019, ECCV.

[13]  Xiangyu Zhang,et al.  MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Xin Wang,et al.  SkipNet: Learning Dynamic Routing in Convolutional Networks , 2017, ECCV.

[15]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[18]  Qun Liu,et al.  DynaBERT: Dynamic BERT with Adaptive Width and Depth , 2020, NeurIPS.

[19]  Mingjie Sun,et al.  Rethinking the Value of Network Pruning , 2018, ICLR.

[20]  Zhiru Zhang,et al.  Channel Gating Neural Networks , 2018, NeurIPS.

[21]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[22]  Cheng-Zhong Xu,et al.  Dynamic Channel Pruning: Feature Boosting and Suppression , 2018, ICLR.

[23]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Xiaojun Chang,et al.  BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Zhiqiang Shen,et al.  MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks , 2020, ArXiv.

[26]  Gao Huang,et al.  Dynamic Neural Networks: A Survey , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Le Yang,et al.  Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification , 2020, NeurIPS.

[29]  Geoffrey E. Hinton,et al.  Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[30]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[31]  Yi Yang,et al.  Gated Channel Transformation for Visual Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[33]  Quoc V. Le,et al.  GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, ArXiv.

[34]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Daniel Guo,et al.  Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning , 2020, ICML.

[36]  Bo Chen,et al.  NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications , 2018, ECCV.

[37]  Theodore Lim,et al.  SMASH: One-Shot Model Architecture Search through HyperNetworks , 2017, ICLR.

[38]  Ning Xu,et al.  Slimmable Neural Networks , 2018, ICLR.

[39]  Fuqiang Zhou,et al.  FSSD: Feature Fusion Single Shot Multibox Detector , 2017, ArXiv.

[40]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[41]  Xiangyu Zhang,et al.  Learning Dynamic Routing for Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[43]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[44]  Lu Yuan,et al.  Revisiting Dynamic Convolution via Matrix Decomposition , 2021, ICLR.

[45]  Youhei Akimoto,et al.  Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search , 2019, ICML.

[46]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[47]  Ruigang Yang,et al.  Improved Techniques for Training Adaptive Deep Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[48]  Quoc V. Le,et al.  BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models , 2020, ECCV.

[49]  Huiqi Li,et al.  Differentiable Neural Architecture Search in Equivalent Space with Exploration Enhancement , 2020, NeurIPS.

[50]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[51]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[52]  Jiang Su,et al.  EagleEye: Fast Sub-net Evaluation for Efficient Neural Network Pruning , 2020, ECCV.

[53]  Debadeepta Dey,et al.  Learning Anytime Predictions in Neural Networks via Adaptive Loss Balancing , 2017, AAAI.

[54]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[55]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[56]  Yang Li,et al.  You Look Twice: GaterNet for Dynamic Filter Selection in CNNs , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Yi Yang,et al.  Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks , 2018, IJCAI.

[58]  Chuan Zhou,et al.  One-Shot Neural Architecture Search: Maximising Diversity to Overcome Catastrophic Forgetting , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Venkatesh Saligrama,et al.  Adaptive Neural Networks for Fast Test-Time Prediction , 2017, ArXiv.

[60]  Mehrtash Harandi,et al.  Hierarchical Neural Architecture Search for Deep Stereo Matching , 2020, NeurIPS.

[61]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[62]  Quoc V. Le,et al.  CondConv: Conditionally Parameterized Convolutions for Efficient Inference , 2019, NeurIPS.

[63]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[64]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[65]  Quoc V. Le,et al.  Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[66]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Huiqi Li,et al.  Overcoming Multi-Model Forgetting in One-Shot NAS With Diversity Maximization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Zhihui Li,et al.  A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions , 2020, ArXiv.

[69]  Jinwoo Shin,et al.  Anytime Neural Prediction via Slicing Networks Vertically , 2018, ArXiv.

[70]  Kilian Q. Weinberger,et al.  Multi-Scale Dense Networks for Resource Efficient Image Classification , 2017, ICLR.

[71]  Serge J. Belongie,et al.  Convolutional Networks with Adaptive Inference Graphs , 2017, International Journal of Computer Vision.

[72]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  Xiaojun Chang,et al.  Block-Wisely Supervised Neural Architecture Search With Knowledge Distillation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Jiahui Yu,et al.  AutoSlim: Towards One-Shot Architecture Search for Channel Numbers , 2019 .

[75]  Thomas S. Huang,et al.  Universally Slimmable Networks and Improved Training Techniques , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).