Max-norm Projections for Factored MDPs

Markov Decision Processes (MDPs) provide a coherent mathematical framework for planning under uncertainty. However, exact MDP solution algorithms require the manipulation of a value function, which specifies a value for each state in the system. Most real-world MDPs are too large for such a representation to be feasible, preventing the use of exact MDP algorithms. Various approximate solution algorithms have been proposed, many of which use a linear combination of basis functions as a compact approximation to the value function. Almost all of these algorithms use an approximation based on the (weighted) L2-norm (Euclidean distance); this approach prevents the application of standard convergence results for MDP algorithms, all of which are based on max-norm. This paper makes two contributions. First, it presents the first approximate MDP solution algorithms - both value and policy iteration - that use max-norm projection, thereby directly optimizing the quantity required to obtain the best error bounds. Second, it shows how these algorithms can be applied efficiently in the context of factored MDPs, where the transition model is specified using a dynamic Bayesian network.

[1]  Sandra L. Berger Massachusetts , 1896, The Journal of comparative medicine and veterinary archives.

[2]  E. Stiefel Note on Jordan elimination, linear programming and Tchebycheff approximation , 1960 .

[3]  R. Bellman,et al.  Polynomial approximation—a new computational technique in dynamic programming: Allocation processes , 1962 .

[4]  Herbert A. Simon,et al.  The Sciences of the Artificial , 1970 .

[5]  Umberto Bertelè,et al.  Nonserial Dynamic Programming , 1972 .

[6]  G. Alexits Approximation theory , 1983 .

[7]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[8]  Ronald J. Williams,et al.  Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .

[9]  Frank Jensen,et al.  From Influence Diagrams to junction Trees , 1994, UAI.

[10]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[11]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[12]  Craig Boutilier,et al.  Abstraction and Approximate Decision-Theoretic Planning , 1997, Artif. Intell..

[13]  Rina Dechter,et al.  Bucket Elimination: A Unifying Framework for Reasoning , 1999, Artif. Intell..

[14]  Daphne Koller,et al.  Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.

[15]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[16]  Daphne Koller,et al.  Policy Iteration for Factored MDPs , 2000, UAI.