Multi-layer Perceptron Error Surfaces: Visualization, Structure and Modelling

The Multi-Layer Perceptron (MLP) is one of the most widely applied and researched Artificial Neural Network model. MLP networks are normally applied to performing supervised learning tasks, which involve iterative training methods to adjust the connection weights within the network. This is commonly formulated as a multivariate non-linear optimization problem over a very high-dimensional space of possible weight configurations. Analogous to the field of mathematical optimization, training an MLP is often described as the search of an error surface for a weight vector which gives the smallest possible error value. Although this presents a useful notion of the training process, there are many problems associated with using the error surface to understand the behaviour of learning algorithms and the properties of MLP mappings themselves. Because of the high-dimensionality of the system, many existing methods of analysis are not well-suited to this problem. Visualizing and describing the error surface are also nontrivial and problematic. These problems are specific to complex systems such as neural networks, which contain large numbers of adjustable parameters, and the investigation of such systems in this way is largely a developing area of research. In this thesis, the concept of the error surface is explored using three related methods. Firstly, Principal Component Analysis (PCA) is proposed as a method for visualizing the learning trajectory followed by an algorithm on the error surface. It is found that PCA provides an effective method for performing such a visualization, as well as providing an indication of the significance of individual weights to the training process. Secondly, sampling methods are used to explore the error surface and to measure certain properties of the error surface, providing the necessary data for an intuitive description of the error surface. A number of practical MLP error surfaces are found to contain a high degree of ultrametric structure, in common with other known configuration spaces of complex systems. Thirdly, a class of global optimization algorithms is also developed, which is focused on the construction and evolution of a model of the error surface (or search spa ce) as an integral part of the optimization process. The relationships between this algorithm class, the Population-Based Incremental Learning algorithm, evolutionary algorithms and cooperative search are discussed. The work provides important practical techniques for exploration of the error surfaces of MLP networks. These techniques can be used to examine the dynamics of different training algorithms, the complexity of MLP mappings and an intuitive description of the nature of the error surface. The configuration spaces of other complex systems are also amenable to many of these techniques. Finally, the algorithmic framework provides a powerful paradigm for visualization of the optimization process and the development of parallel coupled optimization algorithms which apply knowledge of the error surface to solving the optimization problem.

[1]  Lutz Prechelt A study of experimental evaluations of neural network learning algorithms: current research practice , 1994 .

[2]  Peter Auer,et al.  Exponentially many local minima for single neurons , 1995, NIPS.

[3]  Heidar A. Malki,et al.  Using the Karhunen-Loe've transformation in the back-propagation training algorithm , 1991, IEEE Trans. Neural Networks.

[4]  Joachim Diederich,et al.  Survey and critique of techniques for extracting rules from trained artificial neural networks , 1995, Knowl. Based Syst..

[5]  Bryan P. Bergeron Using a spreadsheet metaphor to visualize neural network behavior , 1990 .

[6]  Brian Everitt,et al.  Graphical Techniques for Multivariate Data. , 1978 .

[7]  Stefan M. Rüger,et al.  Clustering in Weight Space of Feedforward Nets , 1996, ICANN.

[8]  Norio Baba,et al.  A new approach for finding the global minimum of error function of neural networks , 1989, Neural Networks.

[9]  Jude W. Shavlik,et al.  Combining the Predictions of Multiple Classifiers: Using Competitive Learning to Initialize Neural Networks , 1995, IJCAI.

[10]  Benjamin W. Wah,et al.  Global Optimization for Neural Network Training , 1996, Computer.

[11]  Wim Hordijk,et al.  A Measure of Landscapes , 1996, Evolutionary Computation.

[12]  R. H. Glendinning,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[13]  Tony Plate,et al.  Visualizing the Function Computed by a Feedforward Neural Network , 2000, Neural Computation.

[14]  Jeffrey L. Elman,et al.  Distributed Representations, Simple Recurrent Networks, and Grammatical Structure , 1991, Mach. Learn..

[15]  Ralf Salomon,et al.  Raising Theoretical Questions About the Utility of Genetic Algorithms , 1997, Evolutionary Programming.

[16]  Seunghwan Kim,et al.  Chaotic dynamics and the geometry of the error surface in neural networks , 1992 .

[17]  Bhaskar D. Rao,et al.  A generalized learning paradigm exploiting the structure of feedforward neural networks , 1996, IEEE Trans. Neural Networks.

[18]  Emile Fiesler,et al.  Neural Network Initialization , 1995, IWANN.

[19]  Roberto Battiti,et al.  Accelerated Backpropagation Learning: Two Optimization Methods , 1989, Complex Syst..

[20]  Steve R. White,et al.  Configuration Space Analysis for Optimization Problems , 1986 .

[21]  Brian Everitt,et al.  Cluster analysis , 1974 .

[22]  J. Beasley Population Heuristics , 1999 .

[23]  Tom Tollenaere,et al.  SuperSAB: Fast adaptive back propagation with good scaling properties , 1990, Neural Networks.

[24]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[25]  Marco Dorigo,et al.  Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[26]  Marco Gori,et al.  Optimal convergence of on-line backpropagation , 1996, IEEE Trans. Neural Networks.

[27]  YoungJu Choie,et al.  Local minima and back propagation , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[28]  Adrian J. Shepherd,et al.  Second-Order Methods for Neural Networks , 1997 .

[29]  Wojtek J. Krzanowski,et al.  Principles of multivariate analysis : a user's perspective. oxford , 1988 .

[30]  Jürgen Schmidhuber,et al.  Simplifying Neural Nets by Discovering Flat Minima , 1994, NIPS.

[31]  Virginia L. Stonick,et al.  488 Solutions to the XOR Problem , 1996, NIPS.

[32]  Lee Altenberg,et al.  Fitness Distance Correlation Analysis: An Instructive Counterexample , 1997, ICGA.

[33]  David A. Medler A Brief History of Connectionism , 1998 .

[34]  Martin A. Riedmiller,et al.  Advanced supervised learning in multi-layer perceptrons — From backpropagation to adaptive learning algorithms , 1994 .

[35]  Ray A. Jarvis,et al.  Adaptive Global Search by the Process of Competitive Evolution , 1975, IEEE Transactions on Systems, Man, and Cybernetics.

[36]  Ah Chung Tsoi,et al.  Comments on local minima free conditions in multilayer perceptrons , 1998, IEEE Trans. Neural Networks.

[37]  Russell W. Anderson Biased Random-Walk Learning: A Neurobiological Correlate to Trial-and-Error , 1993, adap-org/9305002.

[38]  Jondarr Gibb Back propagation Family Album , 1996 .

[39]  Stuart A. Kauffman,et al.  ORIGINS OF ORDER , 2019, Origins of Order.

[40]  Martin Pelikan,et al.  Hill Climbing with Learning (An Abstraction of Genetic Algorithm) , 1995 .

[41]  Mohamed Slimane,et al.  A Critical and Empirical Study of Epistasis Measures for Predicting GA Performances: A Summary , 1997, Artificial Evolution.

[42]  Leonid Kruglyak How to Solve the N Bit Encoder Problem with Just Two Hidden Units , 1990, Neural Computation.

[43]  Shumeet Baluja,et al.  Genetic Algorithms and Explicit Search Statistics , 1996, NIPS.

[44]  R. Fletcher Practical Methods of Optimization , 1988 .

[45]  Tamás D. Gedeon,et al.  An improved technique in porosity prediction: a neural network approach , 1995, IEEE Trans. Geosci. Remote. Sens..

[46]  Douglass J. Wilde,et al.  Foundations of Optimization. , 1967 .

[47]  Helen G. Cobb Is the Genetic Algorithm a Cooperative Learner? , 1992, FOGA.

[48]  Eduardo D. Sontag,et al.  Backpropagation Can Give Rise to Spurious Local Minima Even for Networks without Hidden Layers , 1989, Complex Syst..

[49]  Shin'ichi Tamura,et al.  Capabilities of a four-layered feedforward neural network: four layers versus three , 1997, IEEE Trans. Neural Networks.

[50]  E. K. Blum,et al.  Approximation of Boolean Functions by Sigmoidal Networks: Part I: XOR and Other Two-Variable Functions , 1989, Neural Computation.

[51]  Louise Travé-Massuyès,et al.  Telephone Network Traffic Overloading Diagnosis and Evolutionary Computation Techniques , 1997, Artificial Evolution.

[52]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[53]  Jason Williams,et al.  Neuralis: an artificial neural network package , 1996, ITiCSE '96.

[54]  Michael A. Arbib,et al.  Part II: road maps , 1998 .

[55]  Shumeet Baluja,et al.  A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning , 1994 .

[56]  Christopher M. Bishop,et al.  A Hierarchical Latent Variable Model for Data Visualization , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[57]  Alberto Tesi,et al.  On the Problem of Local Minima in Backpropagation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[58]  D. Rumelhart,et al.  The effective dimension of the space of hidden units , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[59]  Tim Jones Evolutionary Algorithms, Fitness Landscapes and Search , 1995 .

[60]  Janet Wiles,et al.  The N-2-N Encoder: A Matter of Representation , 1993 .

[61]  M. Servais,et al.  Function Optimisation Using Multiple-base Population Based Incremental Learning , 1997 .

[62]  Len Hamey Analysis of the error surface of the XOR network with two hidden nodes , 1996 .

[63]  Kenneth Dean Boese,et al.  Models for iterative global optimization , 1996 .

[64]  Jan Paredis,et al.  Exploiting constraints as background knowledge for evolutionary algorithms , 1997 .

[65]  S. J. Huang,et al.  Training algorithm based on Newton's method with dynamic error control , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[66]  D. Hamad,et al.  Interactive pattern classification by means of artificial neural networks , 1995, 1995 IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century.

[67]  Paul A. Viola,et al.  MIMIC: Finding Optima by Estimating Probability Densities , 1996, NIPS.

[68]  Jude W. Shavlik,et al.  Visualizing Learning and Computation in Artificial Neural Networks , 1992, Int. J. Artif. Intell. Tools.

[69]  W. Cleveland,et al.  The elements of graphing data , 1985 .

[70]  Leonard G. C. Hamey,et al.  The structure of neural network error surfaces , 1995 .

[71]  Ida G. Sprinkhuizen-Kuyper,et al.  The error surface of the 2-2-1 XOR network: The finite stationary points , 1998, Neural Networks.

[72]  Lawrence D. Jackel,et al.  Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..

[73]  Penny Rheingans,et al.  Visualizing structure in high-dimensional multivariate data , 1991, IBM J. Res. Dev..

[74]  David Birnbaum WS Cleveland .The Elements of Graphing Data. , 1996 .

[75]  Sheng-De Wang,et al.  A self growing learning algorithm for determining the appropriate number of hidden units , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[76]  David H. Wolpert,et al.  What makes an optimization problem hard? , 1995, Complex..

[77]  Andrew B. Kahng Exploiting fractalness of error surfaces: New methods for neural network learning , 1992, [Proceedings] 1992 IEEE International Symposium on Circuits and Systems.

[78]  Bernard Widrow,et al.  Scaled stochastic methods for training neural networks , 1996 .

[79]  Derek Partridge Network generalization differences quantified , 1996, Neural Networks.

[80]  Lutz Prechelt Some notes on neural learning algorithm benchmarking , 1995, Neurocomputing.

[81]  K. Lang,et al.  Learning to tell two spirals apart , 1988 .

[82]  David A. Landgrebe,et al.  Supervised classification in high-dimensional space: geometrical, statistical, and asymptotical properties of multivariate data , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[83]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[84]  Paul C. Kainen,et al.  Functionally Equivalent Feedforward Neural Networks , 1994, Neural Computation.

[85]  S. Ergezinger,et al.  An accelerated learning algorithm for multilayer perceptrons: optimization layer by layer , 1995, IEEE Trans. Neural Networks.

[86]  Partha Pratim Kanjilal,et al.  On the application of orthogonal transformation for the design and analysis of feedforward networks , 1995, IEEE Trans. Neural Networks.

[87]  Paolo Frasconi,et al.  Learning in multilayered networks used as autoassociators , 1995, IEEE Trans. Neural Networks.

[88]  S. Baluja An Empirical Comparison of Seven Iterative and Evolutionary Function Optimization Heuristics , 1995 .

[89]  Yoshio Mogami,et al.  A hybrid algorithm for finding the global minimum of error function of neural networks and its applications , 1994, Neural Networks.

[90]  Rich Caruana,et al.  Removing the Genetics from the Standard Genetic Algorithm , 1995, ICML.

[91]  R. Salomon Re-evaluating genetic algorithm performance under coordinate rotation of benchmark functions. A survey of some theoretical and practical aspects of genetic algorithms. , 1996, Bio Systems.

[92]  Emile Fiesler,et al.  High-order and multilayer perceptron initialization , 1997, IEEE Trans. Neural Networks.

[93]  Doris Aaronson,et al.  Visualization of multivariate data: Human-factors considerations , 1995 .

[94]  J. Urgen Branke Evolutionary Algorithms for Neural Network Design and Training , 1995 .

[95]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[96]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[97]  Christian J. Darken Stochastic approximation and neural network learning , 1998 .

[98]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[99]  Terrence J. Sejnowski,et al.  Tempering Backpropagation Networks: Not All Weights are Created Equal , 1995, NIPS.

[100]  Reuven Y. Rubinstein,et al.  Optimization of computer simulation models with rare events , 1997 .

[101]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[102]  Markus Höhfeld,et al.  Improving the Generalization Performance of Multi-Layer-Perceptrons with Population-Based Incremental Learning , 1996, PPSN.

[103]  Luís B. Almeida,et al.  Speeding up Backpropagation , 1990 .

[104]  Xin Yao,et al.  Evolutionary Artificial Neural Networks , 1993, Int. J. Neural Syst..

[105]  Andrew B. Kahng,et al.  Simulated annealing of neural networks: The 'cooling' strategy reconsidered , 1993, 1993 IEEE International Symposium on Circuits and Systems.

[106]  W. Kinzel Physics of Neural Networks , 1990 .

[107]  Rajesh Parekh,et al.  Analysis of Decision Boundaries Generated by Constructive Neural Network Learning Algorithms , 1995 .

[108]  S. Thomas Alexander,et al.  Adaptive Signal Processing , 1986, Texts and Monographs in Computer Science.

[109]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[110]  D. Rumelhart,et al.  Generalization through Minimal Networks with Application to Forecasting , 1992 .

[111]  S. Baluja,et al.  Using Optimal Dependency-Trees for Combinatorial Optimization: Learning the Structure of the Search Space , 1997 .

[112]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[113]  Roger J.-B. Wets,et al.  Minimization by Random Search Techniques , 1981, Math. Oper. Res..

[114]  Radford M. Neal Assessing Relevance determination methods using DELVE , 1998 .

[115]  G. Toulouse,et al.  Ultrametricity for physicists , 1986 .

[116]  I. Cloete,et al.  Animating neural network training , 1992 .

[117]  R. Hecht-Nielsen,et al.  Theory of the Back Propagation Neural Network , 1989 .

[118]  Tamás D. Gedeon,et al.  Simulated annealing and weight decay in adaptive learning: the SARPROP algorithm , 1998, IEEE Trans. Neural Networks.

[119]  J. Elman Distributed Representations, Simple Recurrent Networks, And Grammatical Structure , 1991 .

[120]  D. R. Hush,et al.  Error surfaces for multi-layer perceptrons , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[121]  Albert Y. Zomaya,et al.  Toward generating neural network structures for function approximation , 1994, Neural Networks.

[122]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[123]  Héctor J. Sussmann,et al.  Uniqueness of the weights for minimal feedforward nets with a given input-output map , 1992, Neural Networks.

[124]  Ida G. Sprinkhuizen-Kuyper,et al.  The Error Surface of the Simplest XOR Network Has Only Global Minima , 1996, Neural Computation.

[125]  James Kennedy,et al.  Particle swarm optimization , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[126]  M. Conrad The geometry of evolution. , 1990, Bio Systems.

[127]  Tad Hogg,et al.  Solving the Really Hard Problems with Cooperative Search , 1993, AAAI.

[128]  Hans-Paul Schwefel,et al.  Numerical optimization of computer models , 1981 .

[129]  M. Conrad,et al.  M.V. Volkenstein, evolutionary thinking and the structure of fitness landscapes. , 1992, Bio Systems.

[130]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[131]  B. Orsier,et al.  Another Hybrid Algorithm for Nding a Global Mimimum of Mlp Error Functions , 1996 .

[132]  David A. Thomas,et al.  Integrated mathematics, science, and technology: an introduction to scientific visualization , 1996 .

[133]  Gerald Tesauro,et al.  Visualizing processes in neural networks , 1991, IBM J. Res. Dev..

[134]  Etienne Barnard,et al.  Optimization for training neural nets , 1992, IEEE Trans. Neural Networks.

[135]  Michèle Sebag,et al.  Extending Population-Based Incremental Learning to Continuous Search Spaces , 1998, PPSN.

[136]  Fabio Stella,et al.  Some numerical aspects of the training problem for feed-forward neural nets , 1997, Neural Networks.

[137]  Esther Levin,et al.  Accelerated Learning in Layered Neural Networks , 1988, Complex Syst..

[138]  Michèle Sebag,et al.  Mimetic Evolution , 1997, Artificial Evolution.

[139]  Gary J. Koehler,et al.  Deterministic global optimal FNN training algorithms , 1994, Neural Networks.

[140]  Dimitris A. Karras,et al.  An efficient constrained training algorithm for feedforward networks , 1995, IEEE Trans. Neural Networks.

[141]  Andrew G. Barto,et al.  Learning as hill-climbing in weight space , 1998 .

[142]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[143]  Robert Hecht-Nielsen,et al.  On the Geometry of Feedforward Neural Network Error Surfaces , 1993, Neural Computation.

[144]  Marvin Minsky,et al.  Perceptrons - an introduction to computational geometry , 1969 .

[145]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[146]  Yann LeCun,et al.  Second Order Properties of Error Surfaces: Learning Time and Generalization , 1990, NIPS 1990.

[147]  Peter F. Stadler,et al.  Towards a theory of landscapes , 1995 .

[148]  Leonard G. C. Hamey,et al.  XOR has no local minima: A case study in neural network error surface analysis , 1998, Neural Networks.

[149]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[150]  Stefan M. Rüger,et al.  An analysis of the metric structure of the weight space of feedforward networks and its application to time series modeling and prediction , 1996, ESANN.

[151]  Peter F. Stadler,et al.  Amplitude Spectra of Fitness Landscapes , 1998, Adv. Complex Syst..

[152]  Bernardo A. Huberman,et al.  The performance of cooperative processes , 1990 .

[153]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[154]  William I. Grosky,et al.  A fast algorithm for finding global minima of error functions in layered neural networks , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[155]  Paul W. Munro Visualizations of 2-D hidden unit space , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[156]  William S. Cleveland,et al.  Visualizing Data , 1993 .

[157]  F. Jordan,et al.  Using the symmetries of a multi-layered network to reduce the weight space , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[158]  L. Darrell Whitley,et al.  GENITOR II: a distributed genetic algorithm , 1990, J. Exp. Theor. Artif. Intell..

[159]  Raúl Rojas Oscillating iteration paths in neural networks learning , 1994, Comput. Graph..

[160]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[161]  Raymond Lister Visualizing weight dynamics in the N-2-N encoder , 1993, IEEE International Conference on Neural Networks.

[162]  Harold M. Hastings,et al.  The may-wigner stability theorem , 1982 .

[163]  Yo Horikawa Landscapes of basins of local minima in the XOR problem , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[164]  Kurt Hornik,et al.  Learning in linear neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[165]  Masanao Ohbayashi,et al.  A new random search method for neural network learning-RasID , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[166]  Jürgen Schmidhuber,et al.  Flat Minima , 1997, Neural Computation.

[167]  David B. Fogel,et al.  Alternative Neural Network Training Methods , 1995, IEEE Expert.

[168]  Raymond Lister Fractal strategies for neural network scaling , 1998 .

[169]  Werner Purgathofer,et al.  Selected new trends in scientific visualization , 1998, Other Conferences.

[170]  B. Manly Multivariate Statistical Methods : A Primer , 1986 .

[171]  Bruce E. Rosen,et al.  VFSR trained artificial neural networks , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[172]  Etienne Barnard,et al.  A comparison between criterion functions for linear classifiers, with an application to neural nets , 1989, IEEE Trans. Syst. Man Cybern..

[173]  John F. Kolen,et al.  Backpropagation is Sensitive to Initial Conditions , 1990, Complex Syst..

[174]  P. Lisboa,et al.  Complete solution of the local minima in the XOR problem , 1991 .

[175]  Simon Dennis,et al.  Analysis Tools for Neural Networks , 1991 .

[176]  Warren T. Jones,et al.  DENDRITE: A system for visual interpretation of neural network data , 1992, Proceedings IEEE Southeastcon '92.

[177]  Jack Sklansky,et al.  A neural network that visualizes what it classifies , 1997, Pattern Recognit. Lett..

[178]  John G. Taylor,et al.  The New ERA in Supervised Learning , 1997, Neural Networks.

[179]  Hans-Paul Schwefel,et al.  Evolution and optimum seeking , 1995, Sixth-generation computer technology series.

[180]  Lutz Prechelt Investigation of the CasCor Family of Learning Algorithms , 1997, Neural Networks.

[181]  John H. Holland,et al.  Adaptation in natural and artificial systems , 1975 .

[182]  Chih-Cheng Chen,et al.  A fast multilayer neural-network training algorithm based on the layer-by-layer optimizing procedures , 1996, IEEE Trans. Neural Networks.

[183]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[184]  N. Parga,et al.  Ultrametricity, frustration and the graph colouring problem , 1989 .

[185]  R. Summers,et al.  Artificial neural networks: from black-box to grey-box modelling , 1994, Proceedings of 16th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[186]  John F. Kolen,et al.  Learning in parallel distributed processing networks: Computational complexity and information content , 1991, IEEE Trans. Syst. Man Cybern..

[187]  Singiresu S. Rao Engineering Optimization : Theory and Practice , 2010 .

[188]  John W. Tukey,et al.  Exploratory Data Analysis , 1980, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[189]  S. Phillips The E ect of Representation on Error Surface , 1993 .

[190]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[191]  Stephen Robert Lawrence Neural networks for real world tasks : limitations and solutions , 1997 .

[192]  Robert M. Burton,et al.  Convergence and divergence in neural networks: Processing of chaos and biological analogy , 1992, Neural Networks.

[193]  M. Opper,et al.  5 Statistical Mechanics of Generalization , .

[194]  Horst Bischof,et al.  Constructing a neural network for the interpretation of the species of trees in aerial photographs , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[195]  Jacques de Villiers,et al.  Backpropagation neural nets with one and two hidden layers , 1993, IEEE Trans. Neural Networks.

[196]  David B. Fogel,et al.  An introduction to simulated evolutionary optimization , 1994, IEEE Trans. Neural Networks.

[197]  Javier E. Vitela,et al.  Premature saturation in backpropagation networks: Mechanism and necessary conditions , 1995 .

[198]  J. Elman Representation and structure in connectionist models , 1991 .

[199]  Berndt Müller,et al.  Neural networks: an introduction , 1990 .

[200]  Stephen B. Vardeman Graphical Methods for Data Analysis , 1984 .

[201]  Bing J. Sheu,et al.  Optimization schemes for neural network training , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[202]  Roberto Battiti,et al.  Training neural nets with the reactive tabu search , 1995, IEEE Trans. Neural Networks.

[203]  Ultrametricity Transition in the Graph Colouring Problem , 1986 .

[204]  Michael N. Vrahatis,et al.  Geometry of learning: visualizing the performance of neural network supervised training methods , 1997 .

[205]  G. J. Gibson,et al.  On the decision regions of multilayer perceptrons , 1990, Proc. IEEE.

[206]  Raúl Rojas The fractal geometry of backpropagation , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[207]  Oliver Wendt,et al.  Cooperative Simulated Annealing: How much cooperation is enough ? , 1998 .

[208]  Kennetb A. De Genetic Algorithms Are NOT Function Optimizers , 1992 .


[210]  Thomas Bäck,et al.  An Overview of Evolutionary Algorithms for Parameter Optimization , 1993, Evolutionary Computation.

[211]  Robert Hecht-Nielsen The Munificence of High Dimensionality , 1992 .

[212]  Lutz Prechelt PROBEN 1 - a set of benchmarks and benchmarking rules for neural network training algorithms , 1994 .

[213]  Raúl Rojas Visualizing the learning process for neural networks , 1994, ESANN.

[214]  J. Edward Jackson,et al.  A User's Guide to Principal Components. , 1991 .

[215]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .

[216]  Alexander Linden,et al.  Inversion of neural networks by gradient descent , 1990, Parallel Comput..

[217]  S. Kirkpatrick,et al.  Configuration space analysis of travelling salesman problems , 1985 .

[218]  John T. Behrens,et al.  Applications of multivariate visualization to behavioral sciences , 1995 .

[219]  George Cybenko,et al.  Ill-Conditioning in Neural Network Training Problems , 1993, SIAM J. Sci. Comput..

[220]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[221]  Gerald Tesauro,et al.  Neural Network Visualization , 1989, NIPS.

[222]  Michael A. Arbib,et al.  Part I: The Background , 1998 .

[223]  Robert P. W. Duin,et al.  Initializations, back-propagation and generalization of feed-forward classifiers , 1993, IEEE International Conference on Neural Networks.

[224]  Wj Fitzgerald,et al.  Optimization schemes for neural networks , 1993 .

[225]  Todd K. Leen,et al.  Weight Space Probability Densities in Stochastic Learning: II. Transients and Basin Hopping Times , 1992, NIPS.