论文信息 - Adaptive Exploration Using Stochastic Neurons

Adaptive Exploration Using Stochastic Neurons

Stochastic neurons are deployed for efficient adaptation of exploration parameters by gradient-following algorithms. The approach is evaluated in model-free temporal-difference learning using discrete actions. The advantage is in particular memory efficiency, because memorizing exploratory data is only required for starting states. Hence, if a learning problem consist of only one starting state, exploratory data can be considered as being global. Results suggest that the presented approach can be efficiently combined with standard off- and on-policy algorithms such as Q-learning and Sarsa.

Günther Palm | Michel Tokic

[1] Marco Wiering,et al. Explorations in efficient reinforcement learning , 1999 .

[2] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[3] Günther Palm,et al. Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax , 2011, KI.

[4] Nees Jan van Eck,et al. Application of reinforcement learning to the game of Othello , 2008, Comput. Oper. Res..

[5] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[6] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[7] Friedhelm Schwenker,et al. Learning a Strategy with Neural Approximated Temporal-Difference Methods in English Draughts , 2010, 2010 20th International Conference on Pattern Recognition.

[8] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .

[9] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[12] Stefan Edelkamp,et al. KI 2011: Advances in Artificial Intelligence , 2011, Lecture Notes in Computer Science.

[13] Daniel Kudenko,et al. Online learning of shaping rewards in reinforcement learning , 2010, Neural Networks.

[14] P. Dayan,et al. Cortical substrates for exploratory decisions in humans , 2006, Nature.