暂无分享,去创建一个
[1] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[2] Joel Nothman,et al. SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.
[3] Frank Hutter,et al. Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..
[4] Risto Miikkulainen,et al. Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.
[5] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.
[6] Martin Wistuba,et al. A Survey on Neural Architecture Search , 2019, ArXiv.
[7] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[8] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[9] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Saining Xie,et al. On Network Design Spaces for Visual Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[11] Jascha Sohl-Dickstein,et al. Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks , 2018, ICML.
[12] Elliot Meyerson,et al. Evolving Deep Neural Networks , 2017, Artificial Intelligence in the Age of Neural Networks and Brain Computing.
[13] Yoshua Bengio,et al. How to Initialize your Network? Robust Initialization for WeightNorm & ResNets , 2019, NeurIPS.
[14] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[15] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[16] Kenneth A. De Jong,et al. Cooperative Coevolution: An Architecture for Evolving Coadapted Subcomponents , 2000, Evolutionary Computation.
[17] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.
[18] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[19] Samuel L. Smith,et al. Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks , 2020, NeurIPS.
[20] Samuel S. Schoenholz,et al. The Dynamics of Signal Propagation in Gated Recurrent Neural Networks , 2019 .
[21] Garrison W. Cottrell,et al. ReZero is All You Need: Fast Convergence at Large Depth , 2020, UAI.
[22] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.
[23] Lei Huang,et al. Centered Weight Normalization in Accelerating Training of Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[24] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[25] Thomas Brox,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.
[26] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Quoc V. Le,et al. A Bayesian Perspective on Generalization and Stochastic Gradient Descent , 2017, ICLR.
[28] Risto Miikkulainen,et al. Evolutionary optimization of deep learning activation functions , 2020, GECCO.
[29] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[30] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[31] Samuel L. Smith,et al. Characterizing signal propagation to close the performance gap in unnormalized ResNets , 2021, ICLR.
[32] K. Singleton,et al. An omnibus test for the two-sample problem using the empirical characteristic function , 1986 .
[33] Roberto Prevete,et al. A survey on modern trainable activation functions , 2020, Neural Networks.
[34] Quoc V. Le,et al. Evolving Normalization-Activation Layers , 2020, NeurIPS.
[35] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[36] Boris Hanin,et al. Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients? , 2018, NeurIPS.
[37] Samuel S. Schoenholz,et al. Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks , 2018, ICML.
[38] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[39] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[40] Elliot Meyerson,et al. Evolutionary neural AutoML for deep learning , 2019, GECCO.
[41] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[42] H. B. Mann,et al. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .
[43] Masato Taki. Deep Residual Networks and Weight Initialization , 2017, ArXiv.
[44] Risto Miikkulainen,et al. Forming Neural Networks Through Efficient and Adaptive Coevolution , 1997, Evolutionary Computation.
[45] Quoc V. Le,et al. Smooth Adversarial Training , 2020, ArXiv.
[46] K. Jarrod Millman,et al. Array programming with NumPy , 2020, Nat..
[47] Stephen Marshall,et al. Activation Functions: Comparison of trends in Practice and Research for Deep Learning , 2018, ArXiv.
[48] Sepp Hochreiter,et al. Self-Normalizing Neural Networks , 2017, NIPS.
[49] Jeffrey Pennington,et al. Nonlinear random matrix theory for deep learning , 2019, NIPS.
[50] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[51] Jeffrey Pennington,et al. The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network , 2018, NeurIPS.
[52] Risto Miikkulainen,et al. Discovering Parametric Activation Functions , 2020, Neural Networks.
[53] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[54] Peter M. Roth,et al. The Quest for the Golden Activation Function , 2018, ArXiv.
[55] Surya Ganguli,et al. Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice , 2017, NIPS.
[56] Tengyu Ma,et al. Fixup Initialization: Residual Learning Without Normalization , 2019, ICLR.
[57] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[58] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .
[59] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[60] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[61] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[62] Kenji Doya,et al. Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning , 2017, Neural Networks.
[63] Lucas Dixon,et al. Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.
[64] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[65] Trevor Darrell,et al. Data-dependent Initializations of Convolutional Neural Networks , 2015, ICLR.
[66] Kevin Gimpel,et al. Adjusting for Dropout Variance in Batch Normalization and Weight Initialization , 2016 .
[67] David Rolnick,et al. How to Start Training: The Effect of Initialization and Architecture , 2018, NeurIPS.
[68] Quoc V. Le,et al. Searching for Activation Functions , 2018, arXiv.
[69] Ron Kohavi,et al. Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.
[70] Robert Piessens,et al. Quadpack: A Subroutine Package for Automatic Integration , 2011 .
[71] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[72] Jeffrey Pennington,et al. Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks , 2020, ICLR.
[73] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[74] Jiri Matas,et al. All you need is a good init , 2015, ICLR.
[75] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[76] Andrew Zisserman,et al. Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.
[77] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[78] Sepp Hochreiter,et al. Untersuchungen zu dynamischen neuronalen Netzen , 1991 .
[79] Randal S. Olson,et al. PMLB: a large benchmark suite for machine learning evaluation and comparison , 2017, BioData Mining.
[80] F. Massey. The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .