论文信息 - The Steering Approach for Multi-Criteria Reinforcement Learning

The Steering Approach for Multi-Criteria Reinforcement Learning

We consider the problem of learning to attain multiple goals in a dynamic environment, which is initially unknown. In addition, the environment may contain arbitrarily varying elements related to actions of other agents or to non-stationary moves of Nature. This problem is modelled as a stochastic (Markov) game between the learning agent and an arbitrary player, with a vector-valued reward function. The objective of the learning agent is to have its long-term average reward vector belong to a given target set. We devise an algorithm for achieving this task, which is based on the theory of approachability for stochastic games. This algorithm combines, in an appropriate way, a finite set of standard, scalar-reward learning algorithms. Sufficient conditions are given for the convergence of the learning algorithm to a general target set. The specialization of these results to the single-controller Markov decision problem are discussed as well.

Shie Mannor | Nahum Shimkin | Shie Mannor | N. Shimkin

[1] A. Shwartz,et al. Guaranteed performance regions in Markovian systems with competing decision makers , 1993, IEEE Trans. Autom. Control..

[2] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .

[3] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[4] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.

[5] J. Filar,et al. Competitive Markov Decision Processes , 1996 .

[6] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[7] Ronen I. Brafman,et al. A near-optimal polynomial time algorithm for learning in certain classes of stochastic games , 2000, Artif. Intell..

[8] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[9] A. Neyman,et al. Stochastic games , 1981 .

[10] Vivek S. Borkar,et al. Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..

[11] SRIDHAR MAHADEVAN,et al. Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.

[12] D. Blackwell. An analog of the minimax theorem for vector payoffs. , 1956 .