Exploiting fractalness of error surfaces: New methods for neural network learning

Learning in neural networks can be formulated as global optimization of a multimodal error function over a high-dimensional space of connection weights. A general scaling model that describes the error surface as high-dimensional fraction Brownian motion (FBM), i.e., as a class of random fractals, is developed. The parameter of FBM can be extracted by spectral analysis of the error profile over a random walk in weight space. Scaling structure within the error surface has important implications for stochastic optimizations such as Boltzmann learning. Experimental data that confirm the fractalness of error surfaces for a wide range of problems and connection topologies are reviewed, and the implications of these results are discussed.<<ETX>>

[1]  S. Kauffman,et al.  Towards a general theory of adaptive walks on rugged landscapes. , 1987, Journal of theoretical biology.

[2]  Pierre Baldi,et al.  Linear Learning: Landscapes and Algorithms , 1988, NIPS.

[3]  Geoffrey E. Hinton Deterministic Boltzmann Learning Performs Steepest Descent in Weight-Space , 1989, Neural Computation.

[4]  D. Mitra,et al.  Convergence and finite-time behavior of simulated annealing , 1986, Advances in Applied Probability.

[5]  John F. Kolen,et al.  Backpropagation is Sensitive to Initial Conditions , 1990, Complex Syst..

[6]  Pedersen,et al.  Monte Carlo dynamics of optimization problems: A scaling description. , 1990, Physical review. A, Atomic, molecular, and optical physics.

[7]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[8]  David S. Touretzky Analyzing the Energy Landscapes of Distributed Winner-Take-All Networks , 1988, NIPS.

[9]  W. Press,et al.  Numerical Recipes in Fortran: The Art of Scientific Computing.@@@Numerical Recipes in C: The Art of Scientific Computing. , 1994 .

[10]  Gregory B. Sorkin Simulated annealing on fractals: theoretical analysis and relevance for combinatorial optimization , 1990 .

[11]  P. Pardalos,et al.  Checking local optimality in constrained quadratic programming is NP-hard , 1988 .

[12]  Frank Thomson Leighton,et al.  Graph bisection algorithms with good average case behavior , 1984, Comb..

[13]  William H. Press,et al.  Numerical Recipes in C The Art of Scientific Computing , 1995 .

[14]  Donna Crystal Llewellyn,et al.  Local optimization on graphs , 1989, Discret. Appl. Math..

[15]  Richard F. Voss,et al.  Fractals in nature: from characterization to simulation , 1988 .