Local-utopia policy selection for multi-objective reinforcement learning

Many real-world applications are characterized by multiple conflicting objectives. In such problems, optimality is replaced by Pareto optimality and the goal is to find the Pareto frontier, a set of solutions representing different compromises among the objectives. Despite recent advances in multi-objective optimization, the selection, given the Pareto frontier, of a Pareto-optimal policy is still an important problem, prominent in practical applications such as economics and robotics. In this paper, we present a versatile approach for selecting a policy from the Pareto frontier according to user-defined preferences. Exploiting a novel scalarization function and heuristics, our approach provides an easy-to-use and effective method for Pareto-optimal policy selection. Furthermore, the scalarization is applicable in multiple-policy learning strategies for approximating Pareto frontiers. To show the simplicity and effectiveness of our algorithm, we evaluate it on two problems and compare it to classical multi-objective reinforcement learning approaches.

[1]  P. Papalambros,et al.  A NOTE ON WEIGHTED CRITERIA METHODS FOR COMPROMISE SOLUTIONS IN MULTI-OBJECTIVE OPTIMIZATION , 1996 .

[2]  Konkoly Thege Multi-criteria Reinforcement Learning , 1998 .

[3]  Lothar Thiele,et al.  Comparison of Multiobjective Evolutionary Algorithms: Empirical Results , 2000, Evolutionary Computation.

[4]  Christian R. Shelton,et al.  Importance sampling for reinforcement learning with multiple objectives , 2001 .

[5]  Shie Mannor,et al.  The Steering Approach for Multi-Criteria Reinforcement Learning , 2001, NIPS.

[6]  Andrea Castelletti,et al.  Reinforcement learning in the operational management of a water system , 2002 .

[7]  Naoyuki Kubota,et al.  Local episode-based learning of multi-objective behavior coordination for a mobile robot in dynamic environments , 2003, The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03..

[8]  Jasbir S. Arora,et al.  Survey of multi-objective optimization methods for engineering , 2004 .

[9]  Sheldon H. Jacobson,et al.  A Post-Optimality Analysis Algorithm for Multi-Objective Optimization , 2004, Comput. Optim. Appl..

[10]  Andrew G. Barto,et al.  Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.

[11]  Shie Mannor,et al.  A Geometric Approach to Multi-Criterion Reinforcement Learning , 2004, J. Mach. Learn. Res..

[12]  Kalyanmoy Deb,et al.  Finding Knees in Multi-objective Optimization , 2004, PPSN.

[13]  Sriraam Natarajan,et al.  Dynamic preferences in multi-criteria reinforcement learning , 2005, ICML.

[14]  Oleksandr Romanko,et al.  Normalization and Other Topics in Multi­Objective Optimization , 2006 .

[15]  David Levine,et al.  Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning , 2007, NIPS.

[16]  Heidi A. Taboada,et al.  Data Clustering of Solutions for Multiple Objective System Reliability Optimization Problems , 2007 .

[17]  David W. Coit,et al.  A two-stage approach for multi-objective decision making with applications to system reliability optimization , 2009, Reliab. Eng. Syst. Saf..

[18]  Evan Dekker,et al.  Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.

[19]  Patrice Perny,et al.  On Finding Compromise Solutions in Multiobjective Markov Decision Processes , 2010, ECAI.

[20]  Enrico Zio,et al.  A clustering procedure for reducing the number of representative solutions in the Pareto Front of multiobjective optimization problems , 2011, Eur. J. Oper. Res..

[21]  Andrea Castelletti,et al.  Tree-based Fitted Q-iteration for Multi-Objective Markov Decision problems , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[22]  Susan A. Murphy,et al.  Linear fitted-Q iteration with multiple reward functions , 2013, J. Mach. Learn. Res..

[23]  Marcello Restelli,et al.  A multiobjective reinforcement learning approach to water resources systems operation: Pareto frontier approximation in a single run , 2013 .

[24]  Ann Nowé,et al.  Scalarized multi-objective reinforcement learning: Novel design techniques , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[25]  Darwin G. Caldwell,et al.  Multi-objective reinforcement learning for AUV thruster failure recovery , 2014, 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[26]  Joseph A. Paradiso,et al.  The gesture recognition toolkit , 2014, J. Mach. Learn. Res..

[27]  Ann Nowé,et al.  Multi-objective reinforcement learning using sets of pareto dominating policies , 2014, J. Mach. Learn. Res..

[28]  Marcello Restelli,et al.  Multi-Objective Reinforcement Learning with Continuous Pareto Frontier Approximation , 2014, AAAI.

[29]  Luís Paulo Reis,et al.  Model-Based Relative Entropy Stochastic Search , 2016, NIPS.

[30]  Jan Peters,et al.  Manifold-based multi-objective policy search with sample reuse , 2017, Neurocomputing.