论文信息 - AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks

AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks

Neural networks require careful weight initialization to prevent signals from exploding or vanishing. Existing initialization schemes solve this problem in specific cases by assuming that the network has a certain activation function or topology. It is difficult to derive such weight initialization strategies, and modern architectures therefore often use these same initialization schemes even though their assumptions do not hold. This paper introduces AutoInit, a weight initialization algorithm that automatically adapts to different neural network architectures. By analytically tracking the mean and variance of signals as they propagate through the network, AutoInit is able to appropriately scale the weights at each layer to avoid exploding or vanishing signals. Experiments demonstrate that AutoInit improves performance of various convolutional and residual networks across a range of activation function, dropout, weight decay, learning rate, and normalizer settings. Further, in neural architecture search and activation function meta-learning, AutoInit automatically calculates specialized weight initialization strategies for thousands of unique architectures and hundreds of unique activation functions, and improves performance in vision, language, tabular, multi-task, and transfer learning scenarios. AutoInit thus serves as an automatic configuration tool that makes design of new neural network architectures more robust. The AutoInit package provides a wrapper around existing TensorFlow models and is available at https://github.com/cognizant-ai-labs/autoinit.

Risto Miikkulainen | Garrett Bingham

[1] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2] Joel Nothman,et al. SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[3] Frank Hutter,et al. Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[4] Risto Miikkulainen,et al. Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[5] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[6] Martin Wistuba,et al. A Survey on Neural Architecture Search , 2019, ArXiv.

[7] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[8] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[9] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Saining Xie,et al. On Network Design Spaces for Visual Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11] Jascha Sohl-Dickstein,et al. Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks , 2018, ICML.

[12] Elliot Meyerson,et al. Evolving Deep Neural Networks , 2017, Artificial Intelligence in the Age of Neural Networks and Brain Computing.

[13] Yoshua Bengio,et al. How to Initialize your Network? Robust Initialization for WeightNorm & ResNets , 2019, NeurIPS.

[14] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.

[15] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[16] Kenneth A. De Jong,et al. Cooperative Coevolution: An Architecture for Evolving Coadapted Subcomponents , 2000, Evolutionary Computation.

[17] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.

[18] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[19] Samuel L. Smith,et al. Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks , 2020, NeurIPS.

[20] Samuel S. Schoenholz,et al. The Dynamics of Signal Propagation in Gated Recurrent Neural Networks , 2019 .

[21] Garrison W. Cottrell,et al. ReZero is All You Need: Fast Convergence at Large Depth , 2020, UAI.

[22] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[23] Lei Huang,et al. Centered Weight Normalization in Accelerating Training of Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[25] Thomas Brox,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[26] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Quoc V. Le,et al. A Bayesian Perspective on Generalization and Stochastic Gradient Descent , 2017, ICLR.

[28] Risto Miikkulainen,et al. Evolutionary optimization of deep learning activation functions , 2020, GECCO.

[29] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[30] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[31] Samuel L. Smith,et al. Characterizing signal propagation to close the performance gap in unnormalized ResNets , 2021, ICLR.

[32] K. Singleton,et al. An omnibus test for the two-sample problem using the empirical characteristic function , 1986 .

[33] Roberto Prevete,et al. A survey on modern trainable activation functions , 2020, Neural Networks.

[34] Quoc V. Le,et al. Evolving Normalization-Activation Layers , 2020, NeurIPS.

[35] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[36] Boris Hanin,et al. Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients? , 2018, NeurIPS.

[37] Samuel S. Schoenholz,et al. Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks , 2018, ICML.

[38] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).