Algorithmic aspects of mean-variance optimization in Markov decision processes

[1]  J. Michael Steele,et al.  Markov Decision Problems Where Means Bound Variances , 2014, Oper. Res..

[2]  Xianping Guo,et al.  A mean-variance optimization problem for discounted Markov decision processes , 2012, Eur. J. Oper. Res..

[3]  Shie Mannor,et al.  Policy Gradients with Variance Related Risk Criteria , 2012, ICML.

[4]  Javier de Frutos,et al.  Approximation of Dynamic Programs , 2012 .

[5]  Beatrice Gralton,et al.  Washington DC - USA , 2008 .

[6]  Sean P. Meyn Control Techniques for Complex Networks: Workload , 2007 .

[7]  J. Tsitsiklis,et al.  Robust, risk-sensitive, and data-driven control of markov decision processes , 2007 .

[8]  Sven Koenig,et al.  Functional Value Iteration for Decision-Theoretic Planning with General Utility Functions , 2006, AAAI.

[9]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[10]  Sven Koenig,et al.  Risk-Sensitive Planning with One-Switch Utility Functions: Value Iteration , 2005, AAAI.

[11]  Garud Iyengar,et al.  Robust Dynamic Programming , 2005, Math. Oper. Res..

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  L. Ghaoui,et al.  Robust markov decision processes with uncertain transition matrices , 2004 .

[14]  Frank Riedel,et al.  Dynamic Coherent Risk Measures , 2003 .

[15]  Mihalis Yannakakis,et al.  On the approximability of trade-offs and optimal access of Web sources , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[16]  Philippe Artzner,et al.  Coherent Measures of Risk , 1999 .

[17]  E. Altman Constrained Markov Decision Processes , 1999 .

[18]  E. J. Collins Finite-horizon variance penalised Markov decision processes , 1997 .

[19]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[20]  Ying Huang,et al.  On Finding Optimal Policies for Markov Decision Chains: A Unifying Framework for Mean-Variance-Tradeoffs , 1994, Math. Oper. Res..

[21]  D. J. White Computational approaches to variance-penalised Markov decision processes , 1992 .

[22]  Jerzy A. Filar,et al.  Variance-Penalized Markov Decision Processes , 1989, Math. Oper. Res..

[23]  P. Whittle Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[24]  H. Kawai A variance minimization problem for a Markov decision process , 1987 .

[25]  M. J. Sobel,et al.  Discounted MDP's: distribution functions and exponential utility maximization , 1987 .

[26]  M. J. Sobel The variance of discounted Markov decision processes , 1982, Journal of Applied Probability.

[27]  Ward Whitt,et al.  Approximations of Dynamic Programs, II , 1979, Math. Oper. Res..

[28]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[29]  Ward Whitt,et al.  Approximations of Dynamic Programs, I , 1978, Math. Oper. Res..

[30]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[31]  J. Cockcroft Investment in Science , 1962, Nature.

[32]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.