Distributional Robustness and Regularization in Reinforcement Learning

Distributionally Robust Optimization (DRO) has enabled to prove the equivalence between robustness and regularization in classification and regression, thus providing an analytical reason why regularization generalizes well in statistical learning. Although DRO's extension to sequential decision-making overcomes $\textit{external uncertainty}$ through the robust Markov Decision Process (MDP) setting, the resulting formulation is hard to solve, especially on large domains. On the other hand, existing regularization methods in reinforcement learning only address $\textit{internal uncertainty}$ due to stochasticity. Our study aims to facilitate robust reinforcement learning by establishing a dual relation between robust MDPs and regularization. We introduce Wasserstein distributionally robust MDPs and prove that they hold out-of-sample performance guarantees. Then, we introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function. We extend the result to linear value function approximation for large state spaces. Our approach provides an alternative formulation of robustness with guaranteed finite-sample performance. Moreover, it suggests using regularization as a practical tool for dealing with $\textit{external uncertainty}$ in reinforcement learning methods.

[1]  Shiau Hong Lim,et al.  Kernel-Based Reinforcement Learning in Robust Markov Decision Processes , 2019, ICML.

[2]  A. Guillin,et al.  On the rate of convergence in Wasserstein distance of the empirical measure , 2013, 1312.2128.

[3]  Shie Mannor,et al.  Soft-Robust Actor-Critic Policy-Gradient , 2018, UAI.

[4]  Shie Mannor,et al.  Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs , 2020, AAAI.

[5]  Viet Anh Nguyen,et al.  Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning , 2019, Operations Research & Management Science in the Age of Analytics.

[6]  Panos M. Pardalos,et al.  Convex optimization theory , 2010, Optim. Methods Softw..

[7]  Garud Iyengar,et al.  Ambiguous chance constrained problems and robust optimization , 2006, Math. Program..

[8]  Daniel Kuhn,et al.  Distributionally Robust Convex Optimization , 2014, Oper. Res..

[9]  Daniel Kuhn,et al.  Distributionally Robust Logistic Regression , 2015, NIPS.

[10]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[11]  Daniel Kuhn,et al.  Regularization via Mass Transportation , 2017, J. Mach. Learn. Res..

[12]  Shie Mannor,et al.  Robustness and Regularization of Support Vector Machines , 2008, J. Mach. Learn. Res..

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  Martin A. Riedmiller,et al.  Batch Reinforcement Learning , 2012, Reinforcement Learning.

[15]  Shie Mannor,et al.  Scaling Up Robust MDPs using Function Approximation , 2014, ICML.

[16]  V. Barbu,et al.  Convexity and optimization in banach spaces , 1972 .

[17]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[18]  Samy Bengio,et al.  A Study on Overfitting in Deep Reinforcement Learning , 2018, ArXiv.

[19]  Huan Xu,et al.  Distributionally Robust Counterpart in Markov Decision Processes , 2015, IEEE Transactions on Automatic Control.

[20]  John N. Tsitsiklis,et al.  Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..

[21]  M. KarthyekRajhaaA.,et al.  Robust Wasserstein profile inference and applications to machine learning , 2019, J. Appl. Probab..

[22]  Shie Mannor,et al.  A Bayesian Approach to Robust Reinforcement Learning , 2019, UAI.

[23]  Shie Mannor,et al.  Shallow Updates for Deep Reinforcement Learning , 2017, NIPS.

[24]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[25]  Michael L. Littman,et al.  An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..

[26]  Zhaolin Hu,et al.  Kullback-Leibler divergence constrained distributionally robust optimization , 2012 .

[27]  Melvyn Sim,et al.  Adaptive Distributionally Robust Optimization , 2019, Manag. Sci..

[28]  Aurko Roy,et al.  Reinforcement Learning under Model Mismatch , 2017, NIPS.

[29]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[30]  Insoon Yang,et al.  A Convex Optimization Approach to Distributionally Robust Markov Decision Processes With Wasserstein Distance , 2017, IEEE Control Systems Letters.

[31]  Shie Mannor,et al.  Regularized Policy Iteration , 2008, NIPS.

[32]  Garud Iyengar,et al.  Robust Dynamic Programming , 2005, Math. Oper. Res..

[33]  Andrzej Ruszczynski,et al.  Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..

[34]  Shie Mannor,et al.  Distributionally Robust Markov Decision Processes , 2010, Math. Oper. Res..

[35]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[36]  Yinyu Ye,et al.  Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems , 2010, Oper. Res..

[37]  C. Villani Optimal Transport: Old and New , 2008 .

[38]  Shie Mannor,et al.  Policy Gradient for Coherent Risk Measures , 2015, NIPS.

[39]  Csaba Szepesvari,et al.  Regularization in reinforcement learning , 2011 .

[40]  Insoon Yang,et al.  Wasserstein Distributionally Robust Stochastic Control: A Data-Driven Approach , 2018, IEEE Transactions on Automatic Control.

[41]  Zhi Chen,et al.  Distributionally robust optimization for sequential decision-making , 2018, Optimization.

[42]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[43]  R. Bellman A Markovian Decision Process , 1957 .

[44]  Marek Petrik,et al.  Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs , 2019, NeurIPS.

[45]  Daniel Kuhn,et al.  Robust Markov Decision Processes , 2013, Math. Oper. Res..

[46]  Daniel Kuhn,et al.  Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations , 2015, Mathematical Programming.

[47]  Anja De Waegenaere,et al.  Robust Solutions of Optimization Problems Affected by Uncertain Probabilities , 2011, Manag. Sci..