Large-Scale Markov Decision Problems with KL Control Cost and its Application to Crowdsourcing
暂无分享,去创建一个
[1] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[2] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[3] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[4] Xi Chen,et al. Optimistic Knowledge Gradient Policy for Optimal Budget Allocation in Crowdsourcing , 2013, ICML.
[5] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[6] Benjamin Van Roy,et al. On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..
[7] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..
[8] Milos Hauskrecht,et al. Linear Program Approximations for Factored Continuous-State Markov Decision Processes , 2003, NIPS.
[9] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[10] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[11] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[12] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[13] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[14] Jean-Yves Audibert. Optimization for Machine Learning , 1995 .
[15] Emanuel Todorov,et al. Moving least-squares approximations for linearly-solvable MDP , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[16] Emanuel Todorov,et al. Linearly-solvable Markov decision problems , 2006, NIPS.
[17] Martin J. Wainwright,et al. Randomized Smoothing for Stochastic Optimization , 2011, SIAM J. Optim..
[18] A. S. Manne. Linear Programming and Sequential Decisions , 1960 .
[19] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[20] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[21] Michael H. Veatch,et al. Approximate Linear Programming for Average Cost MDPs , 2013, Math. Oper. Res..
[22] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[23] Vivek F. Farias,et al. Approximate Dynamic Programming via a Smoothed Linear Program , 2009, Oper. Res..
[24] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[25] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[26] Milos Hauskrecht,et al. Solving Factored MDPs with Continuous and Discrete Variables , 2004, UAI.
[27] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[28] Benjamin Van Roy,et al. A Cost-Shaping Linear Program for Average-Cost Approximate Dynamic Programming with Performance Guarantees , 2006, Math. Oper. Res..
[29] Marek Petrik,et al. Constraint relaxation in approximate linear programs , 2009, ICML '09.
[30] Peter L. Bartlett,et al. Linear Programming for Large-Scale Markov Decision Problems , 2014, ICML.
[31] Emanuel Todorov,et al. Eigenfunction approximation methods for linearly-solvable optimal control problems , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[32] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[33] Emanuel Todorov,et al. Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.
[34] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[35] Ohad Shamir,et al. Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.