Practical Reinforcement Learning in Continuous Spaces

Dynamic control tasks are good candidates for the application of reinforcement learning techniques. However, many of these tasks inherently have continuous state or action variables. This can cause problems for traditional reinforcement learning algorithms which assume discrete states and actions. In this paper, we introduce an algorithm that safely approximates the value function for continuous state control tasks, and that learns quickly from a small amount of data. We give experimental results using this algorithm to learn policies for both a simulated task and also for a real robot, operating in an unaltered environment. The algorithm works well in a traditional learning setting, and demonstrates extremely good learning when bootstrapped with a small amount of human-provided data.