Non-Parametric Policy Learning for High-Dimensional State Representations

Learning complex control policies from high-dimensional sensory input is a challenge for reinforcement learning algorithms. Non-parametric methods can help to address this problem, but many current approaches rely on unstable greedy maximization. In this paper, we develop a kernel-based reinforcement learning algorithm that performs robust policy updates. We show that our method outperforms related approaches, and is able to learn an underpowered swing-up task task directly from high-dimensional image data.