Reinforcement learning is a technique to learn suitable action policies that maximize utility, via the clue of reinforcement signals: reward or punishment. Q-learning, a widely used reinforcement learning method, has been analyzed in much research on autonomous agents. However, as the size of the problem space increases, agents need more computational resources and require more time to learn appropriate policies. Whitehead proposed an architecture called modular Q-learning, that decomposes the whole problem space into smaller subproblem spaces, and distributes them among multiple modules. Thus, each module takes charge of part of the whole problem.
In modular Q-learning, however, human designers have to decompose the problem space, and create a suitable set of modules manually. Agents with such a fixed module architecture cannot adapt themselves to dynamic environments. Here, we propose a new architecture for reinforcement learning called AMQL (Automatic Modular Q-Learning), that enables agents to obtain a suitable set of modules by themselves using a selection method.
Through experiments, we show that agents can automatically obtain suitable modules to gain a reward. Furthermore, we show that agents can adapt themselves to dynamic environments efficiently, through reconstructing modules.
[1]
Geoffrey E. Hinton,et al.
Feudal Reinforcement Learning
,
1992,
NIPS.
[2]
Ben J. A. Kröse,et al.
Learning from delayed rewards
,
1995,
Robotics Auton. Syst..
[3]
Jonas Karlsson,et al.
Learning Multiple Goal Behavior via Task Decomposition and Dynamic Policy Merging
,
1993
.
[4]
Michael I. Jordan,et al.
Reinforcement Learning by Probability Matching
,
1995,
NIPS 1995.
[5]
Sebastian Thrun,et al.
Finding Structure in Reinforcement Learning
,
1994,
NIPS.
[6]
Kenji Fukumoto,et al.
Multi-agent Reinforcement Learning: A Modular Approach
,
1996
.
[7]
M. Benda,et al.
On Optimal Cooperation of Knowledge Sources
,
1985
.
[8]
Satinder P. Singh,et al.
Reinforcement Learning with a Hierarchy of Abstract Models
,
1992,
AAAI.