Bias and Variance Approximation in Value Function Estimates

We consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and study the bias and variance in the value function estimates that result from empirical estimates of the model parameters. We provide closed-form approximations for the bias and variance, which can then be used to derive confidence intervals around the value function estimates. We illustrate and validate our findings using a large database describing the transaction and mailing histories for customers of a mail-order catalog firm.

[1]  C. E. Clark The Greatest of a Finite Set of Random Variables , 1961 .

[2]  J. Cockcroft Investment in Science , 1962, Nature.

[3]  R. Bellman Dynamic programming. , 1957, Science.

[4]  R. Serfling Approximation Theorems of Mathematical Statistics , 1980 .

[5]  M. J. Sobel The variance of discounted Markov decision processes , 1982 .

[6]  J. Hüsler Extremes and related properties of random sequences and processes , 1984 .

[7]  John Rust Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher , 1987 .

[8]  Jerzy A. Filar,et al.  Variance-Penalized Markov Decision Processes , 1989, Math. Oper. Res..

[9]  Keith W. Ross,et al.  Variability Sensitive Markov Decision Processes , 1992, Math. Oper. Res..

[10]  Kenneth I. Wolpin,et al.  The Solution and Estimation of Discrete Choice Dynamic Programming Models by Simulation and Interpol , 1994 .

[11]  Eduardo S. Schwartz,et al.  Investment Under Uncertainty. , 1994 .

[12]  J. R. Bult,et al.  Optimal Selection for Direct Mail , 1995 .

[13]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[14]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[15]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[16]  Susana V. Mondschein,et al.  Mailing Decisions in the Catalog Sales Industry , 1996 .

[17]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[18]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[19]  Füsun F. Gönül,et al.  Optimal Mailing of Catalogs: a New Methodology Using Estimable Structural Dynamic Programming Models , 1998 .

[20]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[21]  Jeffrey I. McGill,et al.  Revenue Management: Research Overview and Prospects , 1999, Transp. Sci..

[22]  N. Barberis Investing for the Long Run When Returns are Predictable , 2000 .

[23]  Malcolm J. A. Strens,et al.  A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[24]  Paul H. Zipkin,et al.  Foundations of Inventory Management , 2000 .

[25]  Yihong Xia Learning About Predictability: The Effects of Parameter Uncertainty on Dynamic Asset Allocation , 2000 .

[26]  Warren B. Powell,et al.  An Adaptive Dynamic Programming Algorithm for Dynamic Fleet Management, I: Single Period Travel Times , 2002, Transp. Sci..

[27]  I. Hendel,et al.  Measuring the Implications of Sales and Consumer Stockpiling Behavior , 2002 .

[28]  Christian Schlag Strategic Asset Allocation: Portfolio Choice for Long‐Term Investors. , 2003 .

[29]  John N. Tsitsiklis,et al.  Dynamic Catalog Mailing Policies , 2006, Manag. Sci..