POLITEX: Regret Bounds for Policy Iteration using Expert Prediction
暂无分享,去创建一个
Peter L. Bartlett | Nevena Lazic | Csaba Szepesvári | Yasin Abbasi-Yadkori | Kush Bhatia | Gellért Weisz | P. Bartlett | Csaba Szepesvari | G. Weisz | Yasin Abbasi-Yadkori | K. Bhatia | N. Lazic
[1] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[2] Shie Mannor,et al. Regularized Policy Iteration with Nonparametric Function Spaces , 2016, J. Mach. Learn. Res..
[3] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[4] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[5] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[6] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.
[7] Richard S. Sutton,et al. Multi-step Reinforcement Learning: A Unifying Algorithm , 2017, AAAI.
[8] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[9] Varun Kanade,et al. Tracking Adversarial Targets , 2014, ICML.
[10] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[11] Dimitri P. Bertsekas,et al. Error Bounds for Approximations from Projected Linear Equations , 2010, Math. Oper. Res..
[12] Shie Mannor,et al. Shallow Updates for Deep Reinforcement Learning , 2017, NIPS.
[13] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[14] Martin A. Riedmiller,et al. Quinoa: a Q-function You Infer Normalized Over Actions , 2019, ArXiv.
[15] Francesco Orabona,et al. Scale-Free Algorithms for Online Linear Optimization , 2015, ALT.
[16] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..
[17] Csaba Szepesvári,et al. Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.
[18] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[19] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[20] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..
[21] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[22] Marlos C. Machado,et al. Count-Based Exploration with the Successor Representation , 2018, AAAI.
[23] Benjamin Recht,et al. Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator , 2017, ICML.
[24] Alessandro Lazaric,et al. Finite-sample analysis of least-squares policy iteration , 2012, J. Mach. Learn. Res..
[25] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..
[26] Vicenç Gómez,et al. Fast rates for online learning in Linearly Solvable Markov Decision Processes , 2017, COLT.
[27] Csaba Szepesvari,et al. Online learning for linearly parametrized control problems , 2012 .
[28] John Langford,et al. Search-based structured prediction , 2009, Machine Learning.
[29] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[30] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[31] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[32] Nevena Lazic,et al. Model-Free Linear Quadratic Control via Reduction to Expert Prediction , 2018, AISTATS.
[33] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[34] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[35] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[36] Shie Mannor,et al. Markov Decision Processes with Arbitrary Reward Processes , 2009, Math. Oper. Res..
[37] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[38] Marc G. Bellemare,et al. Approximate Exploration through State Abstraction , 2018, ArXiv.
[39] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[40] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[41] C. D. Meyer,et al. Comparison of perturbation bounds for the stationary distribution of a Markov chain , 2001 .
[42] X. Cao,et al. Single Sample Path-Based Optimization of Markov Chains , 1999 .
[43] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[44] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[45] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[46] Benjamin Van Roy,et al. Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.
[47] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[48] A. P. Hyper-parameters. Count-Based Exploration with Neural Density Models , 2017 .
[49] Ian Osband,et al. The Uncertainty Bellman Equation and Exploration , 2017, ICML.
[50] Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.
[51] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.
[52] Bo Liu,et al. Regularized Off-Policy TD-Learning , 2012, NIPS.
[53] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..
[54] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[55] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[56] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[57] Zheng Wen,et al. Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization , 2013, Math. Oper. Res..
[58] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[59] K. Narendra,et al. Persistent excitation in adaptive systems , 1987 .
[60] Joel A. Tropp,et al. User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..
[61] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.