论文信息 - Error Bounds for Approximate Value Iteration

Error Bounds for Approximate Value Iteration

Approximate Value Iteration (AVI) is an method for solving a Markov Decision Problem by making successive calls to a supervised learning (SL) algorithm. Sequence of value representations Vn are processed iteratively by Vn+1 = ATVn where T is the Bellman operator and A an approximation operator. Bounds on the error between the performance of the policies induced by the algorithm and the optimal policy are given as a function of weighted Lp-norms (p ≥ 1) of the approximation errors. The results extend usual analysis in L∞-norm, and allow to relate the performance of AVI to the approximation power (usually expressed in Lp-norm, for p = 1 or 2) of the SL algorithm. We illustrate the tightness of these bounds on an optimal replacement problem.

Rémi Munos | R. Munos

[1] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[2] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.

[3] Alexander J. Smola,et al. Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[4] John Rust. Numerical dynamic programming in economics , 1996 .

[5] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[6] S. Mallat,et al. Adaptive greedy approximations , 1997 .

[7] R. DeVore,et al. Nonlinear approximation , 1998, Acta Numerica.

[8] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[9] S. Mallat. A wavelet tour of signal processing , 1998 .

[10] Daphne Koller,et al. Policy Iteration for Factored MDPs , 2000, UAI.

[11] Carlos Guestrin,et al. Max-norm Projections for Factored MDPs , 2001, IJCAI.

[12] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[13] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.

[14] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[15] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[16] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.

[17] Andrew W. Moore,et al. Locally Weighted Learning , 1997, Artificial Intelligence Review.

[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.