Universality in halting time and its applications in optimization

The authors present empirical universal distributions for the halting time (measured by the number of iterations to reach a given accuracy) of optimization algorithms applied to two random systems: spin glasses and deep learning. Given an algorithm, which we take to be both the optimization routine and the form of the random landscape, the fluctuations of the halting time follow a distribution that remains unchanged even when the input is changed drastically. We observe two main universality classes, a Gumbel-like distribution that appears in Google searches, human decision times, QR factorization and spin glasses, and a Gaussian-like distribution that appears in conjugate gradient method, deep network with MNIST input data and deep network with random input data.

[1]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[2]  E. Kostlan Complexity theory of numerical linear algebra , 1988 .

[3]  A. Greenbaum Behavior of slightly perturbed Lanczos and conjugate-gradient recurrences , 1989 .

[4]  Anne Greenbaum,et al.  Predicting the Behavior of Finite Precision Lanczos and Conjugate Gradient Computations , 2015, SIAM J. Matrix Anal. Appl..

[5]  P. Deift Orthogonal Polynomials and Random Matrices: A Riemann-Hilbert Approach , 2000 .

[6]  R. Adler,et al.  Random Fields and Geometry , 2007 .

[7]  Antonio Auffinger,et al.  Random Matrices and Complexity of Spin Glasses , 2010, 1003.1129.

[8]  Joshua Correll,et al.  A neural computation model for decision-making times , 2012 .

[9]  Yann LeCun,et al.  The Loss Surface of Multilayer Networks , 2014, ArXiv.

[10]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[11]  P. Deift,et al.  Universality in numerical computations with random data , 2014, Proceedings of the National Academy of Sciences.

[12]  Yann LeCun,et al.  Explorations on high dimensional landscapes , 2014, ICLR.

[13]  T. Trogdon,et al.  Sampling unitary ensembles , 2015 .

[14]  Yann LeCun,et al.  The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[15]  P. Deift,et al.  On the condition number of the critically-scaled Laguerre Unitary Ensemble , 2015, 1507.00750.

[16]  P. Deift,et al.  Universality for the Toda Algorithm to Compute the Largest Eigenvalue of a Random Matrix , 2016, 1604.07384.

[17]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .