Universal halting times in optimization and machine learning

The authors present empirical distributions for the halting time (measured by the number of iterations to reach a given accuracy) of optimization algorithms applied to two random systems: spin glasses and deep learning. Given an algorithm, which we take to be both the optimization routine and the form of the random landscape, the fluctuations of the halting time follow a distribution that, after centering and scaling, remains unchanged even when the distribution on the landscape is changed. We observe two qualitative classes: A Gumbel-like distribution that appears in Google searches, human decision times, the QR eigenvalue algorithm and spin glasses, and a Gaussian-like distribution that appears in conjugate gradient method, deep network with MNIST input data and deep network with random input data. This empirical evidence suggests presence of a class of distributions for which the halting time is independent of the underlying distribution under some conditions.

[1]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[2]  A. Greenbaum Behavior of slightly perturbed Lanczos and conjugate-gradient recurrences , 1989 .

[3]  Anne Greenbaum,et al.  Predicting the Behavior of Finite Precision Lanczos and Conjugate Gradient Computations , 2015, SIAM J. Matrix Anal. Appl..

[4]  C. Tracy,et al.  Level-spacing distributions and the Airy kernel , 1992, hep-th/9211141.

[5]  R. Adler,et al.  Random Fields and Geometry , 2007 .

[6]  Antonio Auffinger,et al.  Random Matrices and Complexity of Spin Glasses , 2010, 1003.1129.

[7]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[8]  N. Pillai,et al.  Universality of covariance matrices , 2011, 1110.2501.

[9]  Joshua Correll,et al.  A neural computation model for decision-making times , 2012 .

[10]  H. Yau,et al.  On the principal components of sample covariance matrices , 2014, 1404.0788.

[11]  P. Deift,et al.  Universality in numerical computations with random data , 2014, Proceedings of the National Academy of Sciences.

[12]  Yann LeCun,et al.  Explorations on high dimensional landscapes , 2014, ICLR.

[13]  P. Deift,et al.  On the condition number of the critically-scaled Laguerre Unitary Ensemble , 2015, 1507.00750.

[14]  P. Deift,et al.  How long does it take to compute the eigenvalues of a random, symmetric matrix? , 2012, 1203.4635.

[15]  P. Deift,et al.  Universality for the Toda algorithm to compute the eigenvalues of a random matrix , 2016 .

[16]  Thomas Trogdon,et al.  Universality for Eigenvalue Algorithms on Sample Covariance Matrices , 2017, SIAM J. Numer. Anal..