Efficient Identification of State in Reinforcement Learning

A very general framework for modeling uncertainty in learning environments is given by partially observable Markov Decision Processes (POMDPs). In a POMDP setting, the learning agent infers a policy for acting optimally in all possible states of the environment, while receiving only partial observations of these states. To represent an optimal policy for a POMDP, it is generally necessary to employ some form of memory. Perfect memory is represented by the belief space, i.e. the space of probability distributions over environmental states. Unfortunately, computing policies defined on the belief space requires a considerable amount of prior knowledge about the learning problem and is expensive in terms of computation time. Stated precisely, working with belief states makes it necessary to have information both about the system dynamics of the process as well as about the function generating the observations. In this article, we will present a reinforcement learning algorithm for solving deterministic POMDPs based on short-term memory. A well-known implementation of short-term memory is given by the history list concept. In contrast to belief states, history lists do not allow us to infer optimal policies, but are far more practical and require no prior knowledge about the learning problem.