Active Learning of Dynamic Bayesian Networks in Markov Decision Processes

Several recent techniques for solving Markov decision processes use dynamic Bayesian networks to compactly represent tasks. The dynamic Bayesian network representation may not be given, in which case it is necessary to learn it if one wants to apply these techniques. We develop an algorithm for learning dynamic Bayesian network representations of Markov decision processes using data collected through exploration in the environment. To accelerate data collection we develop a novel scheme for active learning of the networks. We assume that it is not possible to sample the process in arbitrary states, only along trajectories, which prevents us from applying existing active learning techniques. Our active learning scheme selects actions that maximize the total entropy of distributions used to evaluate potential refinements of the networks.

[1]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[2]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[3]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[4]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[5]  David Heckerman,et al.  Learning Bayesian Networks: Search Methods and Experimental Results , 1995 .

[6]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[7]  Kevin P. Murphy,et al.  Learning the Structure of Dynamic Probabilistic Networks , 1998, UAI.

[8]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[9]  Michael Kearns,et al.  Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[10]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[11]  Daphne Koller,et al.  Active Learning for Structure in Bayesian Networks , 2001, IJCAI.

[12]  Carlos Guestrin,et al.  Max-norm Projections for Factored MDPs , 2001, IJCAI.

[13]  Sridhar Mahadevan,et al.  Continuous-Time Hierarchical Reinforcement Learning , 2001, ICML.

[14]  Tommi S. Jaakkola,et al.  Unsupervised Active Learning in Large Domains , 2002, UAI.

[15]  Shlomo Zilberstein,et al.  Symbolic Generalization for On-line Planning , 2002, UAI.

[16]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  Kevin Murphy,et al.  Active Learning of Causal Bayes Net Structure , 2006 .

[19]  Andrew G. Barto,et al.  Causal Graph Based Decomposition of Factored MDPs , 2006, J. Mach. Learn. Res..

[20]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.