论文信息 - Recurrent Parameter Generators

Recurrent Parameter Generators

We present a generic method for recurrently using the same parameters for many different convolution layers to build a deep network. Specifically, for a network, we create a recurrent parameter generator (RPG), from which the parameters of each convolution layer are generated. Though using recurrent models to build a deep convolutional neural network (CNN) is not entirely new, our method achieves significant performance gain compared to the existing works. We demonstrate how to build a one-layer neural network to achieve similar performance compared to other traditional CNN models on various applications and datasets. Such a method allows us to build an arbitrarily complex neural network with any amount of parameters. For example, we build a ResNet34 with model parameters reduced by more than 400 times, which still achieves 41.6% ImageNet top-1 accuracy. Furthermore, we demonstrate the RPG can be applied at different scales, such as layers, blocks, or even sub-networks. Specifically, we use the RPG to build a ResNet18 network with the number of weights equivalent to one convolutional layer of a conventional ResNet and show this model can achieve 67.2% ImageNet top-1 accuracy. The proposed method can be viewed as an inverse approach to model compression. Rather than removing the unused parameters from a large model, it aims to squeeze more information into a small number of parameters. Extensive experiment results are provided to demonstrate the power of the proposed recurrent parameter generator.

Yann LeCun | Brian Cheung | Stella X. Yu | Jiayun Wang | Yubei Chen

[1] Michael C. Mozer,et al. Using Relevance to Reduce Network Size Automatically , 1989 .

[2] Jitendra Malik,et al. Human Pose Estimation with Iterative Error Feedback , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Varun Ramakrishna,et al. Pose Machines: Articulated Pose Estimation via Inference Machines , 2014, ECCV.

[4] Yann LeCun,et al. Regularization of Neural Networks using DropConnect , 2013, ICML.

[5] Bernt Schiele,et al. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6] David H. Wolpert,et al. Stacked generalization , 1992, Neural Networks.

[7] J. M. Hupé,et al. Cortical feedback improves discrimination between figure and background by V1, V2 and V3 neurons , 1998, Nature.

[8] Bruno A. Olshausen,et al. Superposition of many models into one , 2019, NeurIPS.

[9] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[10] Samy Wu Fung,et al. Fixed Point Networks: Implicit Depth Models with Jacobian-Free Backprop , 2021, ArXiv.

[11] Song Han,et al. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[12] Deng Cai,et al. COP: Customized Deep Model Compression via Regularized Correlation-Based Filter-Level Pruning , 2019, IJCAI.

[13] Vladlen Koltun,et al. Multiscale Deep Equilibrium Models , 2020, NeurIPS.

[14] Vincent Lepetit,et al. SharpNet: Fast and Accurate Recovery of Occluding Contours in Monocular Depth Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[15] Lihi Zelnik-Manor,et al. Knapsack Pruning with Inner Distillation , 2020, ArXiv.

[16] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[18] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[19] Silvio Savarese,et al. Joint 2D-3D-Semantic Data for Indoor Scene Understanding , 2017, ArXiv.

[20] C. Gilbert,et al. Brain States: Top-Down Influences in Sensory Processing , 2007, Neuron.

[21] Tim Curran,et al. The Limits of Feedforward Vision: Recurrent Processing Promotes Robust Object Recognition when Objects Are Degraded , 2012, Journal of Cognitive Neuroscience.

[22] Shilin He,et al. Multi-Task Learning with Shared Encoder for Non-Autoregressive Machine Translation , 2020, NAACL.

[23] Thang D. Bui,et al. Hierarchical Gaussian Process Priors for Bayesian Neural Network Weights , 2020, NeurIPS.

[24] Rudrasis Chakraborty,et al. Orthogonal Convolutional Neural Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[26] Vladlen Koltun,et al. Deep Equilibrium Models , 2019, NeurIPS.

[27] Jose Javier Gonzalez Ortiz,et al. What is the State of Neural Network Pruning? , 2020, MLSys.

[28] Varun Ramakrishna,et al. Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[30] Ben Taskar,et al. Structured Prediction Cascades , 2010, AISTATS.

[31] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[32] Kenneth O. Stanley,et al. Compositional Pattern Producing Networks : A Novel Abstraction of Development , 2007 .

[33] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[34] Mingjie Sun,et al. Rethinking the Value of Network Pruning , 2018, ICLR.

[35] Boris Katz,et al. ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models , 2019, NeurIPS.

[36] Yi Yang,et al. Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks , 2018, IJCAI.

[37] Jitendra Malik,et al. Iterative Instance Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Xiangyu Zhang,et al. Implicit Feature Pyramid Network for Object Detection , 2020, ArXiv.

[39] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Ross B. Girshick,et al. Fast and Accurate Model Scaling , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[42] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[43] Lin Sun,et al. Feedback Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[45] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[46] Peter Dayan,et al. Probabilistic Meta-Representations Of Neural Networks , 2018, ArXiv.

[47] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Jia Deng,et al. Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[49] Ashish Khetan,et al. PruneNet: Channel Pruning via Global Importance , 2020, ArXiv.

[50] Yi Yang,et al. More is Less: A More Complicated Network with Less Inference Complexity , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Yuandong Tian,et al. FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Mark Chen,et al. Scaling Laws for Autoregressive Generative Modeling , 2020, ArXiv.

[53] Dit-Yan Yeung,et al. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[54] Max Welling,et al. Relaxed Quantization for Discretized Neural Networks , 2018, ICLR.

[55] Kenneth O. Stanley,et al. A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks , 2009, Artificial Life.

[56] Nicholas J. Butko,et al. Optimal scanning for faster object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[57] Yi Yang,et al. Network Pruning via Transformable Architecture Search , 2019, NeurIPS.

[58] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.

[59] Ping Liu,et al. Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[61] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.