暂无分享,去创建一个
Xi Chen | Peter L. Bartlett | Yasin Abbasi-Yadkori | Alan Malek | P. Bartlett | Xi Chen | Yasin Abbasi-Yadkori | Alan Malek
[1] Benjamin Van Roy,et al. Approximate Linear Programming for Average-Cost Dynamic Programming , 2002, NIPS.
[2] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[3] Shalabh Bhatnagar,et al. A Linearly Relaxed Approximate Linear Program for Markov Decision Processes , 2017, IEEE Transactions on Automatic Control.
[4] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[5] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[6] Benjamin Van Roy,et al. On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..
[7] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[8] Milos Hauskrecht,et al. Solving Factored MDPs with Continuous and Discrete Variables , 2004, UAI.
[9] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[10] Yasin Abbasi-Yadkori,et al. Optimizing over a Restricted Policy Class in Markov Decision Processes , 2018, ArXiv.
[11] T. Lai,et al. Self-Normalized Processes: Limit Theory and Statistical Applications , 2001 .
[12] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[13] Benjamin Van Roy,et al. A Cost-Shaping Linear Program for Average-Cost Approximate Dynamic Programming with Performance Guarantees , 2006, Math. Oper. Res..
[14] Marek Petrik,et al. Constraint relaxation in approximate linear programs , 2009, ICML '09.
[15] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[16] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[17] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..
[18] Milos Hauskrecht,et al. Linear Program Approximations for Factored Continuous-State Markov Decision Processes , 2003, NIPS.
[19] Michael Bowling,et al. Dual Representations for Dynamic Programming , 2008 .
[20] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[21] Csaba Szepesvari,et al. Online learning for linearly parametrized control problems , 2012 .
[22] Lihong Li,et al. Scalable Bilinear π Learning Using State and Action Features , 2018, ICML 2018.
[23] A. S. Manne. Linear Programming and Sequential Decisions , 1960 .
[24] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[25] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[26] Michael H. Veatch,et al. Approximate Linear Programming for Average Cost MDPs , 2013, Math. Oper. Res..
[27] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.
[28] Vivek F. Farias,et al. Approximate Dynamic Programming via a Smoothed Linear Program , 2009, Oper. Res..