Developments in stochastic optimization algorithms with gradient approximations based on function measurements

There has recently been much interest in recursive optimization algorithms that rely on measurements of only the objective function, not requiring measurements of the gradient (or higher derivatives) of the objective function. The algorithms are implemented by forming an approximation to the gradient at each iteration that is based on the function measurements. Such algorithms have the advantage of not requiring detailed modeling information describing the relationship between the parameters to be optimized and the objective function. To properly cope with the noise that generally occurs in the measurements, these algorithms are best placed within a stochastic approximation framework. This paper discusses some of the main contributions to this class of algorithms, beginning in the early 1950s and progressing until now.

[1]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[2]  J. Spall Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[3]  Ron Meir,et al.  A Parallel Gradient Descent Method for Learning in Analog VLSI Neural Networks , 1992, NIPS.

[4]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[5]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[6]  Pierre L'Ecuyer,et al.  An overview of derivative estimation , 1991, 1991 Winter Simulation Conference Proceedings..

[7]  Michael C. Fu,et al.  Optimization via simulation: A review , 1994, Ann. Oper. Res..

[8]  Yu. M. Ermol’ev On the method of generalized stochastic gradients and quasi-Féjer sequences , 1969 .

[9]  M. A. Styblinski,et al.  Experiments in nonconvex optimization: Stochastic approximation with function smoothing and simulated annealing , 1990, Neural Networks.

[10]  J. Spall A second order stochastic approximation algorithm using only function measurements , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.

[11]  J. Spall A Stochastic Approximation Technique for Generating Maximum Likelihood Parameter Estimates , 1987, 1987 American Control Conference.

[12]  S. Yakowitz A globally convergent stochastic approximation , 1993 .

[13]  J. Blum Multidimensional Stochastic Approximation Methods , 1954 .

[14]  J. Sacks Asymptotic Distribution of Stochastic Approximation Procedures , 1958 .

[15]  Gert Cauwenberghs Analog VLSI autonomous systems for learning and optimization , 1994 .

[16]  Jonathan Baxter,et al.  Learning internal representations , 1995, COLT '95.

[17]  Paul Glasserman,et al.  Gradient Estimation Via Perturbation Analysis , 1990 .

[18]  Gert Cauwenberghs,et al.  A Fast Stochastic Error-Descent Algorithm for Supervised Learning and Optimization , 1992, NIPS.

[19]  Model-free control of general discrete-time systems , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[20]  J. Spall A stochastic approximation algorithm for large-dimensional systems in the Kiefer-Wolfowitz setting , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[21]  J. Spall,et al.  Nonlinear adaptive control using neural networks: estimation with a smoothed form of simultaneous perturbation gradient approximation , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[22]  F. Downton Stochastic Approximation , 1969, Nature.

[23]  A. Tsybakov,et al.  On stochastic approximation with arbitrary noise (the KW-case) , 1992 .

[24]  J. Spall,et al.  Direct adaptive control of nonlinear systems using neural networks and stochastic approximation , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.

[25]  Y. Ermoliev Stochastic quasigradient methods and their application to system optimization , 1983 .

[26]  V. Fabian Stochastic Approximation of Minima with Improved Asymptotic Speed , 1967 .

[27]  Tamio Shimizu,et al.  A Stochastic Approximation Method for Optimization Problems , 1969, Journal of the ACM.

[28]  M. S. Bazaraa,et al.  Nonlinear Programming , 1979 .

[29]  D. C. Chin,et al.  A more efficient global optimization algorithm based on Styblinski and Tang , 1994, Neural Networks.