论文信息 - Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming

Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming

We address the problem of determining optimal stepsizes for estimating parameters in the context of approximate dynamic programming. The sufficient conditions for convergence of the stepsize rules have been known for 50 years, but practical computational work tends to use formulas with parameters that have to be tuned for specific applications. The problem is that in most applications in dynamic programming, observations for estimating a value function typically come from a data series that can be initially highly transient. The degree of transience affects the choice of stepsize parameters that produce the fastest convergence. In addition, the degree of initial transience can vary widely among the value function parameters for the same dynamic program. This paper reviews the literature on deterministic and stochastic stepsize rules, and derives formulas for optimal stepsizes for minimizing estimation error. This formula assumes certain parameters are known, and an approximation is proposed for the case where the parameters are unknown. Experimental work shows that the approximation provides faster convergence than other popular formulas.

Warren B. Powell | Abraham P. George | Warrren B Powell | A. George

[1] R. F.,et al. Mathematical Statistics , 1944, Nature.

[2] J. Kiefer,et al. Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[3] J. Blum. Approximation Methods which Converge with Probability one , 1954 .

[4] J. Blum. Multidimensional Stochastic Approximation Methods , 1954 .

[5] H. Kesten. Accelerated Stochastic Approximation , 1958 .

[6] R. Brown. Statistical forecasting for inventory control , 1960 .

[7] Peter R. Winters,et al. Forecasting Sales by Exponentially Weighted Moving Averages , 1960 .

[8] C. C. Holt,et al. Planning Production, Inventories, and Work Force. , 1962 .

[9] S. Friedman. On Stochastic Approximations , 1963 .

[10] D. W. Trigg,et al. Exponential Smoothing with an Adaptive Response Rate , 1967 .

[11] C. D. Lewis,et al. Monitoring a Forecasting System , 1968 .

[12] M. T. Wasan. Stochastic Approximation , 1969 .

[13] N. S. Barnett,et al. Private communication , 1969 .

[14] George N. Saridis,et al. Learning Applied to Successive Approximation Algorithms , 1970, IEEE Trans. Syst. Sci. Cybern..

[15] Jan Kmenta,et al. Elements of econometrics , 1988 .

[16] Carlos S. Kubrusly,et al. Stochastic approximation algorithms and applications , 1973, CDC 1973.

[17] P. Bickel,et al. Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[18] Michel Installe,et al. Stochastic approximation methods , 1978 .

[19] Everette S. Gardner. Automatic monitoring of forecast errors , 1983 .

[20] Peter C. Young,et al. Recursive Estimation and Time-Series Analysis: An Introduction , 1984 .

[21] Everette S. Gardner,et al. Exponential smoothing: The state of the art , 1985 .

[22] C. P. Kwong,et al. Dual Sign Algorithm for Adaptive Filtering , 1986, IEEE Trans. Commun..

[23] A. Ruszczynski,et al. A method of aggregate stochastic subgradients with on-line stepsize rules for convex stochastic programming problems , 1986 .

[24] Hong Wang,et al. Recursive estimation and time-series analysis , 1986, IEEE Trans. Acoust. Speech Signal Process..

[25] Leonid G. Kazovsky,et al. Adaptive filters with individual adaptation of parameters , 1986 .

[26] Richard W. Harris,et al. A variable step (VS) adaptive filter algorithm , 1986, IEEE Trans. Acoust. Speech Signal Process..

[27] Alexei A. Gaivoronski,et al. Stochastic Quasigradient Methods and their Implementation , 1988 .

[28] Robert A. Jacobs,et al. Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[29] Thomas Kailath,et al. Adaptive algorithms with an automatic gain control feature , 1988 .

[30] G. Ch. Pflug,et al. Stepsize Rules, Stopping Times and their Implementation in Stochastic Quasigradient Algorithms , 1988 .

[31] S. Karni,et al. A new convergence factor for adaptive filters , 1989 .

[32] John E. Moody,et al. Note on Learning Rate Schedules for Stochastic Optimization , 1990, NIPS.

[33] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[34] J. Brossier. Egalisation adaptative et estimation de phase : application aux communications sous-marines , 1992 .

[35] V. John Mathews,et al. A stochastic gradient adaptive filter with gradient adaptive step size , 1993, IEEE Trans. Signal Process..

[36] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[37] Robert F. Stengel,et al. Optimal Control and Estimation , 1994 .

[38] H. Kushner,et al. Analysis of adaptive step-size SA algorithms for parameter tracking , 1995, IEEE Trans. Autom. Control..

[39] S. Douglas,et al. Stochastic Gradient Adaptive Step Size Algorithmsfor Adaptive Filtering , 1995 .

[40] Scott C. Douglas. Generalized gradient adaptive step sizes for stochastic gradient adaptive filters , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[41] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[42] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[43] G. Wittum,et al. Adaptive filtering , 1997 .

[44] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[45] Mahesan Niranjan,et al. Hierarchical Bayesian-Kalman models for regularisation and ARD in sequential learning , 1997 .

[46] Doina Precup,et al. Exponentiated Gradient Methods for Reinforcement Learning , 1997, ICML.

[47] S.C. Douglas,et al. Adaptive step size techniques for decorrelation and blind source separation , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[48] Nicol N. Schraudolph,et al. Local Gain Adaptation in Stochastic Gradient Descent , 1999 .

[49] Christian Kleinewächter,et al. On identification , 2005, Electron. Notes Discret. Math..

[50] M. Brown,et al. On the identification of non-stationary linear processes , 2000, Int. J. Syst. Sci..

[51] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..

[52] James C. Spall,et al. Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[53] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.

[54] Tim Hesterberg,et al. Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.

[55] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[56] C. N Bouza,et al. Spall, J.C. Introduction to stochastic search and optimization. Estimation, simulation and control. Wiley Interscience Series in Discrete Mathematics and Optimization, 2003 , 2004 .

[57] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[58] R. Sutton. Gain Adaptation Beats Least Squares , 2006 .

[59] H. Robbins. A Stochastic Approximation Method , 1951 .

[60] James C. Spall,et al. Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .