The Neuro Slot Car Racer: Reinforcement Learning in a Real World Setting

This paper describes a novel real-world reinforcement learning application: The Neuro Slot Car Racer. In addition to presenting the system and first results based on Neural Fitted Q-Iteration, a standard batch reinforcement learning technique, an extension is proposed that is capable of improving training times and results by allowing for a reduction of samples required for successful training. The Neuralgic Pattern Selection approach achieves this by applying a failure-probability function which emphasizes neuralgic parts of the state space during sampling.

[1]  C. Watkins Learning from delayed rewards , 1989 .

[2]  Mark Plutowski,et al.  Selecting concise training sets from clean data , 1993, IEEE Trans. Neural Networks.

[3]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[4]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[5]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[6]  D. Ernst Selecting concise sets of samples for a reinforcement learning agent , 2005 .

[7]  W. Burgard,et al.  Autonomous blimp control using model-free reinforcement learning in a continuous state and action space , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Martin A. Riedmiller,et al.  Neural Reinforcement Learning Controllers for a Real Robot Application , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[9]  Martin A. Riedmiller,et al.  Learning to Drive a Real Car in 20 Minutes , 2007, 2007 Frontiers in the Convergence of Bioscience and Information Technologies.

[10]  J. Peters,et al.  Approximate dynamic programming with Gaussian processes , 2008, 2008 American Control Conference.