Paying attention to what matters: observation abstraction in partially observable environments

Autonomous agents may not have access to complete information about the state of the environment. For example, a robot soccer player may only be able to estimate the locations of other players not in the scope of its sensors. However, even though all the information needed for ideal decision making cannot be sensed, all that is sensed is usually not needed. The noise and motion of spectators, for example, can be ignored in order to focus on the game field. Standard formulations do not consider this situation, assuming that all the can be sensed must be included in any useful abstraction. This dissertation extends the Markov Decision Process Homomorphism framework (Ravindran, 2004) to partially observable domains, focusing specically on reducing Partially Observable Markov Decision Processes (POMDPs) when the model is known. This involves ignoring aspects of the observation function which are irrelevant to a particular task. Abstraction is particularly important in partially observable domains, as it enables the formation of a smaller domain model and thus more efficient use of the observed features.

[1]  A. Barto,et al.  An algebraic approach to abstraction in reinforcement learning , 2004 .

[2]  Craig Boutilier,et al.  Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[3]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[4]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[5]  Erik Talvitie,et al.  Building Incomplete but Accurate Models , 2008, ISAIM.

[6]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[7]  John G. Kemeny,et al.  Finite Markov Chains. , 1960 .

[8]  Avi Pfeffer,et al.  Sufficiency, Separability and Temporal Probabilistic Models , 2001, UAI.

[9]  Sridhar Mahadevan,et al.  Samuel Meets Amarel: Automating Value Function Approximation Using Global State Space Analysis , 2005, AAAI.

[10]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[11]  Alicia P. Wolfe,et al.  Decision Tree Methods for Finding Reusable MDP Homomorphisms , 2006, AAAI.

[12]  Craig Boutilier,et al.  Abstraction and Approximate Decision-Theoretic Planning , 1997, Artif. Intell..

[13]  Shlomo Zilberstein,et al.  Value-based observation compression for DEC-POMDPs , 2008, AAMAS.

[14]  Leslie Pack Kaelbling,et al.  Toward Approximate Planning in Very Large Stochastic Domains , 1994, AAAI 1994.

[15]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[16]  J. Hartmanis,et al.  Algebraic Structure Theory Of Sequential Machines , 1966 .

[17]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[18]  Craig Boutilier,et al.  Value-Directed Compression of POMDPs , 2002, NIPS.

[19]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[20]  Vishal Soni,et al.  Abstraction in Predictive State Representations , 2007, AAAI.

[21]  Michael H. Bowling,et al.  Action respecting embedding , 2005, ICML.

[22]  Vishal Soni,et al.  Relational Knowledge with Predictive State Representations , 2007, IJCAI.

[23]  Charles Lee Isbell,et al.  Looping suffix tree-based inference of partially observable hidden state , 2006, ICML.