论文信息 - Generalization and Scaling in Reinforcement Learning

Generalization and Scaling in Reinforcement Learning

In associative reinforcement learning, an environment generates input vectors, a learning system generates possible output vectors, and a reinforcement function computes feedback signals from the input-output pairs. The task is to discover and remember input-output pairs that generate rewards. Especially difficult cases occur when rewards are rare, since the expected time for any algorithm can grow exponentially with the size of the problem. Nonetheless, if a reinforcement function possesses regularities, and a learning algorithm exploits them, learning time can be reduced below that of non-generalizing algorithms. This paper describes a neural network algorithm called complementary reinforcement back-propagation (CRBP), and reports simulation results on problems designed to offer differing opportunities for generalization.

David H. Ackley | Michael L. Littman | M. Littman | D. Ackley

[1] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .

[2] P. Anandan,et al. Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[3] A G Barto,et al. Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.

[4] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[5] Charles W. Anderson,et al. Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .

[6] Geoffrey E. Hinton,et al. Learning representations by back-propagation errors, nature , 1986 .

[7] D. Ackley. A connectionist machine for genetic hillclimbing , 1987 .

[8] David H. Ackley. Associative Learning via Inhibitory Search , 1988, NIPS.

[9] Robert B. Allen. Developing agent models with a neural reinforcement technique , 1989, Conference Proceedings., IEEE International Conference on Systems, Man and Cybernetics.