Adaptive Exploration Using Stochastic Neurons

Stochastic neurons are deployed for efficient adaptation of exploration parameters by gradient-following algorithms. The approach is evaluated in model-free temporal-difference learning using discrete actions. The advantage is in particular memory efficiency, because memorizing exploratory data is only required for starting states. Hence, if a learning problem consist of only one starting state, exploratory data can be considered as being global. Results suggest that the presented approach can be efficiently combined with standard off- and on-policy algorithms such as Q-learning and Sarsa.

[1]  Marco Wiering,et al.  Explorations in efficient reinforcement learning , 1999 .

[2]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[3]  Günther Palm,et al.  Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax , 2011, KI.

[4]  Nees Jan van Eck,et al.  Application of reinforcement learning to the game of Othello , 2008, Comput. Oper. Res..

[5]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[6]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[7]  Friedhelm Schwenker,et al.  Learning a Strategy with Neural Approximated Temporal-Difference Methods in English Draughts , 2010, 2010 20th International Conference on Pattern Recognition.

[8]  Sebastian Thrun,et al.  Efficient Exploration In Reinforcement Learning , 1992 .

[9]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[12]  Stefan Edelkamp,et al.  KI 2011: Advances in Artificial Intelligence , 2011, Lecture Notes in Computer Science.

[13]  Daniel Kudenko,et al.  Online learning of shaping rewards in reinforcement learning , 2010, Neural Networks.

[14]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.