Accelerated second-order stochastic optimization using only function measurements

Consider the problem of loss-function minimization when only (possibly noisy) measurements of the loss function are available. In particular, no measurements of the gradient of the loss function are assumed available. The simultaneous perturbation SA (SPSA) algorithm has successfully addressed one of the major shortcomings of those finite-difference SA algorithms by significantly reducing the number of measurements required in many multivariate problems of practical interest. This paper presents a second-order SPSA algorithm that is based on estimating both the loss function gradient and inverse Hessian matrix at each iteration. The aim of this approach is to emulate the acceleration properties associated with deterministic algorithms of Newton-Raphson form, particularly in the terminal phase where the first-order SPSA algorithm slows down in its convergence. This second-order SPSA algorithm requires only five loss function measurements at each iteration, independent of the problem dimension. This paper represents a significantly enhanced version of a previously introduced second-order algorithm by the author (1996).

[1]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[2]  J. Blum Approximation Methods which Converge with Probability one , 1954 .

[3]  J. Sacks Asymptotic Distribution of Stochastic Approximation Procedures , 1958 .

[4]  F. Downton Stochastic Approximation , 1969, Nature.

[5]  M. T. Wasan Stochastic Approximation , 1969 .

[6]  R. Laha Probability Theory , 1979 .

[7]  R. Shumway,et al.  Estimation and tests of hypotheses for the initial mean and covariance in the kalman filter model , 1981 .

[8]  Fang-Kuo Sun,et al.  A maximum likelihood algorithm for the mean and covariance of nonidentically distributed observations , 1982 .

[9]  M. Metivier,et al.  Applications of a Kushner and Clark lemma to general classes of stochastic algorithms , 1984, IEEE Trans. Inf. Theory.

[10]  D. Ruppert A Newton-Raphson Version of the Multivariate Robbins-Monro Procedure , 1985 .

[11]  S. Evans,et al.  On the almost sure convergence of a general stochastic approximation procedure , 1986, Bulletin of the Australian Mathematical Society.

[12]  D. Ruppert,et al.  Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .

[13]  J. Spall A stochastic approximation algorithm for large-dimensional systems in the Kiefer-Wolfowitz setting , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[14]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[15]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[16]  J. Spall Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[17]  Model-free control of nonlinear stochastic systems in discrete time , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[18]  J. Spall Stochastic version of second-order (Newton-Raphson) optimization using only function measurements , 1995, Winter Simulation Conference Proceedings, 1995..

[19]  S. Kulkarni,et al.  An alternative proof for convergence of stochastic approximation algorithms , 1996, IEEE Trans. Autom. Control..

[20]  D. C. Chin,et al.  Comparative study of stochastic algorithms for system optimization based on gradient approximations , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[21]  J. Dippon,et al.  Weighted Means in Stochastic Approximation of Minima , 1997 .

[22]  J. L. Maryak Some guidelines for using iterate averaging in stochastic approximation , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[23]  J. Spall Implementation of the simultaneous perturbation algorithm for stochastic optimization , 1998 .