Towards a practical Bayes-optimal agent

Only rich statistical models are adequate for agents that must learn to navigate complex environments. However, it has not been clear how methods for planning can take advantage of these models. Myopic methods such as Thompson Sampling have shortcomings that we illustrate with formal counter-examples. We show that Bayes-Adaptive planning can be combined in a principled way with approximate sampling, and demonstrate the power of the resulting method in a challenging task involving safe exploration. This highlights the importance of propagating beliefs in realistic cases involving trade-offs between exploration and exploitation.