A survey of randomized algorithms for training neural networks

As a powerful tool for data regression and classification, neural networks have received considerable attention from researchers in fields such as machine learning, statistics, computer vision and so on. There exists a large body of research work on network training, among which most of them tune the parameters iteratively. Such methods often suffer from local minima and slow convergence. It has been shown that randomization based training methods can significantly boost the performance or efficiency of neural networks. Among these methods, most approaches use randomization either to change the data distributions, and/or to fix a part of the parameters or network configurations.źThis article presents a comprehensive survey of the earliest work and recent advances as well as some suggestions for future research.

[1]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[2]  Taskin Koçak,et al.  Learning in the feed-forward Random Neural Network: A Critical Review , 2010, ISCIS.

[3]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[4]  Hélène Paugam-Moisy,et al.  JNN, a randomized algorithm for training multilayer networks in polynomial time , 1999, Neurocomputing.

[5]  Zhenghao Chen,et al.  On Random Weights and Unsupervised Feature Learning , 2011, ICML.

[6]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[7]  Yoh-Han Pao,et al.  Adaptive pattern recognition and neural networks , 1989 .

[8]  Badong Chen,et al.  Universal Approximation with Convex Optimization: Gimmick or Reality? [Discussion Forum] , 2015, IEEE Computational Intelligence Magazine.

[9]  Dirk Husmeier,et al.  Neural Networks for Predicting Conditional Probability Densities: Improved Training Scheme Combining EM and RVFL , 1998, Neural Networks.

[10]  C. L. Philip Chen A rapid supervised learning neural network for function interpolation and approximation , 1996, IEEE Trans. Neural Networks.

[11]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[12]  Henry Markram,et al.  Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[13]  H. D. Block The perceptron: a model for brain functioning. I , 1962 .

[14]  Jian Sun,et al.  Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Jonathan Engel,et al.  Teaching Feed-Forward Neural Networks by Simulated Annealing , 1988, Complex Syst..

[17]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[18]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[19]  Alberto Tesi,et al.  On the Problem of Local Minima in Backpropagation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Fernando José Von Zuben,et al.  Immune and Neural Network Models: Theoretical and Empirical Comparisons , 2001, Int. J. Comput. Intell. Appl..

[21]  J. Orbach Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms. , 1962 .

[22]  Dean P. Foster,et al.  Faster Ridge Regression via the Subsampled Randomized Hadamard Transform , 2013, NIPS.

[23]  Kenneth Ward Church,et al.  Very sparse random projections , 2006, KDD '06.

[24]  Herbert Jaeger,et al.  Adaptive Nonlinear System Identification with Echo State Networks , 2002, NIPS.

[25]  Ponnuthurai N. Suganthan,et al.  Ensemble Classification and Regression-Recent Developments, Applications and Future Directions [Review Article] , 2016, IEEE Computational Intelligence Magazine.

[26]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[27]  Tommy W. S. Chow,et al.  Comments on "Stochastic choice of basis functions in adaptive function approximation and the functional-link net" [and reply] , 1997, IEEE Trans. Neural Networks.

[28]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[29]  Yi-Chung Hu Functional-link nets with genetic-algorithm-based learning for robust nonlinear interval regression analysis , 2009, Neurocomputing.

[30]  Benjamin Recht,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[31]  Peter L. Bartlett,et al.  Using random weights to train multilayer networks of hard-limiting units , 1992, IEEE Trans. Neural Networks.

[32]  Xiao-Lei Zhang,et al.  Nonlinear Dimensionality Reduction of Data by Deep Distributed Random Samplings , 2014, ACML.

[33]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[34]  Lei Wang,et al.  A Novel Neural Network Based on Immunity , 2002, IC-AI.

[35]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[36]  Nicolas Le Roux,et al.  Deep Belief Networks Are Compact Universal Approximators , 2010, Neural Computation.

[37]  E. Polak Introduction to linear and nonlinear programming , 1973 .

[38]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[39]  Robert P. W. Duin,et al.  Feedforward neural networks with random weights , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[40]  Hongxing Li,et al.  Fuzzy Neural Intelligent Systems , 2000 .

[41]  R.G. Baraniuk,et al.  Compressive Sensing [Lecture Notes] , 2007, IEEE Signal Processing Magazine.

[42]  Yoh-Han Pao,et al.  The functional link net and learning optimal control , 1995, Neurocomputing.

[43]  Richard G. Baraniuk,et al.  Compressive Sensing , 2008, Computer Vision, A Reference Guide.

[44]  Dejan J. Sobajic,et al.  Learning and generalization characteristics of the random vector Functional-link net , 1994, Neurocomputing.

[45]  Daniel Cownden,et al.  Random feedback weights support learning in deep neural networks , 2014, ArXiv.

[46]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[47]  Tianyou Chai,et al.  Multisource Data Ensemble Modeling for Clinker Free Lime Content Estimate in Rotary Kiln Sintering Processes , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[48]  Paulo Cortez,et al.  Particle swarms for feedforward neural network training , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[49]  Erol Gelenbe,et al.  Random Neural Networks with Negative and Positive Signals and Product Form Solution , 1989, Neural Computation.

[50]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[51]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[52]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[53]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[54]  Y. Takefuji,et al.  Functional-link net computing: theory, system architecture, and functionalities , 1992, Computer.

[55]  Andrew Y. Ng,et al.  Selecting Receptive Fields in Deep Networks , 2011, NIPS.

[56]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[57]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[58]  Stelios Timotheou,et al.  The Random Neural Network: A Survey , 2010, Comput. J..

[59]  Tomaso A. Poggio,et al.  Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[60]  David D. Cox,et al.  A High-Throughput Screening Approach to Discovering Good Forms of Biologically Inspired Visual Representation , 2009, PLoS Comput. Biol..

[61]  Ponnuthurai N. Suganthan,et al.  Random vector functional link network for short-term electricity load demand forecasting , 2016, Inf. Sci..

[62]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  C. L. Philip Chen,et al.  A rapid learning and dynamic stepwise updating algorithm for flat neural networks and the application to time-series prediction , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[64]  Ruslan Salakhutdinov,et al.  Learning Stochastic Feedforward Neural Networks , 2013, NIPS.

[65]  Le Song,et al.  Scalable Kernel Methods via Doubly Stochastic Gradients , 2014, NIPS.

[66]  Rob Fergus,et al.  Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.

[67]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[68]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[69]  Yihong Gong,et al.  Human Tracking Using Convolutional Neural Networks , 2010, IEEE Transactions on Neural Networks.

[70]  Abdullah Al Mamun,et al.  Multiobjective Evolutionary Neural Networks for Time Series Forecasting , 2006, EMO.

[71]  Prospero C. Naval,et al.  Training Neural Networks Using Multiobjective Particle Swarm Optimization , 2006, ICNC.

[72]  Yi-Chung Hu,et al.  Functional-link net with fuzzy integral for bankruptcy prediction , 2007, Neurocomputing.

[73]  Andrew B. Kahng,et al.  Simulated annealing of neural networks: The 'cooling' strategy reconsidered , 1993, 1993 IEEE International Symposium on Circuits and Systems.

[74]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[75]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[76]  Taskin Koçak,et al.  Survey of random neural network applications , 2000, Eur. J. Oper. Res..

[77]  Mathias Quoy,et al.  Structure and Dynamics of Random Recurrent Neural Networks , 2006, Adapt. Behav..

[78]  P. N. Suganthan,et al.  A comprehensive evaluation of random vector functional link networks , 2016, Inf. Sci..

[79]  Dianhui Wang,et al.  Fast decorrelated neural network ensembles with random weights , 2014, Inf. Sci..

[80]  Yoh-Han Pao,et al.  Stochastic choice of basis functions in adaptive function approximation and the functional-link net , 1995, IEEE Trans. Neural Networks.