Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty

We consider Markov decision processes under parameter uncertainty. Previous studies all restrict to the case that uncertainties among different states are uncoupled, which leads to conservative solutions. In contrast, we introduce an intuitive concept, termed "Lightning Does not Strike Twice," to model coupled uncertain parameters. Specifically, we require that the system can deviate from its nominal parameters only a bounded number of times. We give probabilistic guarantees indicating that this model represents real life situations and devise tractable algorithms for computing optimal control policies.

[1]  Shie Mannor,et al.  Percentile Optimization for Markov Decision Processes with Parameter Uncertainty , 2010, Oper. Res..

[2]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[3]  Shie Mannor,et al.  Bayesian Reinforcement Learning , 2012, Reinforcement Learning.

[4]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[5]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[6]  Larry G. Epstein,et al.  Learning Under Ambiguity , 2002 .

[7]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[8]  Malcolm J. A. Strens,et al.  A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[9]  Arkadi Nemirovski,et al.  Robust Convex Optimization , 1998, Math. Oper. Res..

[10]  Jean-Philippe Vial,et al.  Robust Optimization , 2021, ICORES.

[11]  Melvyn Sim,et al.  The Price of Robustness , 2004, Oper. Res..

[12]  John N. Tsitsiklis,et al.  Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[15]  Pascal Poupart,et al.  Bayesian Reinforcement Learning , 2010, Encyclopedia of Machine Learning.

[16]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[17]  Geoffrey I. Webb,et al.  Encyclopedia of Machine Learning , 2011, Encyclopedia of Machine Learning.

[18]  G. Bennett Probability Inequalities for the Sum of Independent Random Variables , 1962 .

[19]  Garud Iyengar,et al.  Robust Dynamic Programming , 2005, Math. Oper. Res..

[20]  Shie Mannor,et al.  Distributionally Robust Markov Decision Processes , 2010, Math. Oper. Res..

[21]  Robert Givan,et al.  Bounded-parameter Markov decision processes , 2000, Artif. Intell..