Comparative study of stochastic algorithms for system optimization based on gradient approximations

Stochastic approximation (SA) algorithms can be used in system optimization problems for which only noisy measurements of the system are available and the gradient of the loss function is not. This type of problem can be found in adaptive control, neural network training, experimental design, stochastic optimization, and many other areas. This paper studies three types of SA algorithms in a multivariate Kiefer-Wolfowitz setting, which uses only noisy measurements of the loss function (i.e., no loss function gradient measurements). The algorithms considered are: the standard finite-difference SA (FDSA) and two accelerated algorithms, the random directions SA (RDSA) and the simultaneous-perturbation SA (SPSA). RDSA and SPSA use randomized gradient approximations based on (generally) far fewer function measurements than FDSA in each Iteration. This paper describes the asymptotic error distribution for a class of RDSA algorithms, and compares the RDSA, SPSA, and FDSA algorithms theoretically (using mean-square errors computed from asymptotic distributions) and numerically. Based on the theoretical and numerical results, SPSA is the preferable algorithm to use.

[1]  A. Ruszczynski,et al.  Stochastic approximation method with gradient averaging for unconstrained problems , 1983 .

[2]  A. Tsybakov,et al.  On stochastic approximation with arbitrary noise (the KW-case) , 1992 .

[3]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[4]  M. A. Styblinski,et al.  Experiments in nonconvex optimization: Stochastic approximation with function smoothing and simulated annealing , 1990, Neural Networks.

[5]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[6]  M. T. Wasan Stochastic Approximation , 1969 .

[7]  V. Fabian On Asymptotic Normality in Stochastic Approximation , 1968 .

[8]  J. Spall Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[9]  J. Spall A Stochastic Approximation Technique for Generating Maximum Likelihood Parameter Estimates , 1987, 1987 American Control Conference.

[10]  Xi-Ren Cao,et al.  Perturbation analysis of discrete event dynamic systems , 1991 .

[11]  Alan Weiss,et al.  Sensitivity analysis via likelihood ratios , 1986, WSC '86.

[12]  J. Spall A stochastic approximation algorithm for large-dimensional systems in the Kiefer-Wolfowitz setting , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[13]  L. Goldstein Minimizing noisy functionals in hilbert space: An extension of the Kiefer-Wolfowitz procedure , 1988 .

[14]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[15]  D. Burkholder On a Class of Stochastic Approximation Processes , 1956 .

[16]  M. S. Bazaraa,et al.  Nonlinear Programming , 1979 .

[17]  J. Blum Multidimensional Stochastic Approximation Methods , 1954 .

[18]  J. Sacks Asymptotic Distribution of Stochastic Approximation Procedures , 1958 .