Adaptive stochastic approximation by the simultaneous perturbation method

Stochastic approximation (SA) has long been applied for problems of minimizing loss functions or root finding with noisy input information. As with all stochastic search algorithms, there are adjustable algorithm coefficients that must be specified, and that can have a profound effect on algorithm performance. It is known that choosing these coefficients according to an SA analog of the deterministic Newton-Raphson algorithm provides an optimal or near-optimal form of the algorithm. However, directly determining the required Hessian matrix (or Jacobian matrix for root finding) to achieve this algorithm form has often been difficult or impossible in practice. The paper presents a general adaptive SA algorithm that is based on a simple method for estimating the Hessian matrix, while concurrently estimating the primary parameters of interest. The approach applies in both the gradient-free optimization (Kiefer-Wolfowitz) and root-finding/stochastic gradient-based (Robbins-Monro) settings, and is based on the "simultaneous perturbation (SP)" idea introduced previously. The algorithm requires only a small number of loss function or gradient measurements per iteration-independent of the problem dimension-to adaptively estimate the Hessian and parameters of primary interest. Aside from introducing the adaptive SP approach, the paper presents practical implementation guidance, asymptotic theory, and a nontrivial numerical evaluation. Also included is a discussion and numerical analysis comparing the adaptive SP approach with the iterate-averaging approach to accelerated SA.

[1]  J. Blum Approximation Methods which Converge with Probability one , 1954 .

[2]  J. H. Venter An extension of the Robbins-Monro procedure , 1967 .

[3]  V. Fabian On Asymptotic Normality in Stochastic Approximation , 1968 .

[4]  F. Downton Stochastic Approximation , 1969, Nature.

[5]  M. T. Wasan Stochastic Approximation , 1969 .

[6]  Carlos S. Kubrusly,et al.  Stochastic approximation algorithms and applications , 1973, CDC 1973.

[7]  V. Fabian On Asymptotically Efficient Recursive Estimation , 1978 .

[8]  R. Laha Probability Theory , 1979 .

[9]  E. Eweda,et al.  Second-order convergence analysis of stochastic adaptive linear filtering , 1983 .

[10]  M. Metivier,et al.  Applications of a Kushner and Clark lemma to general classes of stochastic algorithms , 1984, IEEE Trans. Inf. Theory.

[11]  D. Ruppert A Newton-Raphson Version of the Multivariate Robbins-Monro Procedure , 1985 .

[12]  S. Evans,et al.  On the almost sure convergence of a general stochastic approximation procedure , 1986, Bulletin of the Australian Mathematical Society.

[13]  C. Z. Wei Multivariate Adaptive Stochastic Approximation , 1987 .

[14]  J. Spall A stochastic approximation algorithm for large-dimensional systems in the Kiefer-Wolfowitz setting , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[15]  M. A. Styblinski,et al.  Experiments in nonconvex optimization: Stochastic approximation with function smoothing and simulated annealing , 1990, Neural Networks.

[16]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[17]  G. Pflug Applicational aspects of stochastic approximation , 1992 .

[18]  Harro Walk Foundations of stochastic approximation , 1992 .

[19]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[20]  G. Pflug,et al.  Stochastic approximation and optimization of random systems , 1992 .

[21]  G. Yin,et al.  Averaging procedures in adaptive filtering: an efficient approach , 1992 .

[22]  Lennart Ljung,et al.  Applications to adaptation algorithms , 1992 .

[23]  J. Spall Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[24]  D. C. Chin,et al.  A more efficient global optimization algorithm based on Styblinski and Tang , 1994, Neural Networks.

[25]  Robert W. Brennan,et al.  Stochastic optimization applied to a manufacturing system operation problem , 1995, WSC '95.

[26]  Hans-Paul Schwefel,et al.  Evolution and optimum seeking , 1995, Sixth-generation computer technology series.

[27]  Han-Fu Chen,et al.  A Stochastic Approximation Algorithm with Random Differences , 1996 .

[28]  James C. Spall,et al.  A one-measurement form of simultaneous perturbation stochastic approximation , 1997, Autom..

[29]  J. Spall Accelerated second-order stochastic optimization using only function measurements , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[30]  D. C. Chin,et al.  Comparative study of stochastic algorithms for system optimization based on gradient approximations , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[31]  J. Dippon,et al.  Weighted Means in Stochastic Approximation of Minima , 1997 .

[32]  Rui J. P. de Figueiredo,et al.  Learning rules for neuro-controller via simultaneous perturbation , 1997, IEEE Trans. Neural Networks.

[33]  C. Kao,et al.  A modified quasi-newton method for optimization in simulation , 1997 .

[34]  J. L. Maryak Some guidelines for using iterate averaging in stochastic approximation , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[35]  Payman Sadegh,et al.  Constrained optimization via stochastic approximation with a simultaneous perturbation gradient approximation , 1997, Autom..

[36]  J. Spall Implementation of the simultaneous perturbation algorithm for stochastic optimization , 1998 .

[37]  J. Spall Adaptive stochastic approximation by the simultaneous perturbation method , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).

[38]  J. Spall,et al.  Optimal random perturbations for stochastic approximation using a simultaneous perturbation gradient approximation , 1997, Proceedings of the 1997 American Control Conference (Cat. No.97CH36041).

[39]  J. Spall,et al.  Model-free control of nonlinear stochastic systems with discrete-time measurements , 1998, IEEE Trans. Autom. Control..

[40]  László Gerencsér,et al.  Convergence rate of moments in stochastic approximation with simultaneous perturbation gradient approximation and resetting , 1999, IEEE Trans. Autom. Control..

[41]  A. V. Vande Wouwer,et al.  On the use of simultaneous perturbation stochastic approximation for neural network training , 1999, Proceedings of the 1999 American Control Conference (Cat. No. 99CH36251).

[42]  Ronald R. Luman,et al.  Upgrading Complex Systems of Systems: A CAIV Methodology for Warfare Area Requirements Allocation , 2000 .

[43]  Pierre L'Ecuyer,et al.  Global Stochastic Optimization with Low-Dispersion Point Sets , 1998, Oper. Res..