暂无分享,去创建一个
Nando de Freitas | Alexander Novikov | Ziyu Wang | Cosmin Paduraru | Caglar Gulcehre | Konrad Zolna | Tom Le Paine | Andrea Michi | T. Paine | Caglar Gulcehre | Ziyun Wang | N. D. Freitas | Cosmin Paduraru | Alexander Novikov | Konrad Zolna | Andrea Michi
[1] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Deep Reinforcement Learning , 2020, International Conference on Machine Learning.
[2] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[3] Finale Doshi-Velez,et al. POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning , 2020, AISTATS.
[4] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[5] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[6] Sergey Levine,et al. Off-Policy Evaluation via Off-Policy Classification , 2019, NeurIPS.
[7] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.
[8] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[9] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .
[10] Ameet Talwalkar,et al. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..
[11] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[12] Richard Tanburn,et al. Making Efficient Use of Demonstrations to Solve Hard Exploration Problems , 2019, ICLR.
[13] Masatoshi Uehara,et al. Minimax Weight and Q-Function Learning for Off-Policy Evaluation , 2019, ICML.
[14] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.
[15] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[16] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[17] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[18] Yisong Yue,et al. Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning , 2019, ArXiv.
[19] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[20] Sergio Gomez Colmenarejo,et al. Acme: A Research Framework for Distributed Reinforcement Learning , 2020, ArXiv.
[21] Yuval Tassa,et al. dm_control: Software and Tasks for Continuous Control , 2020, Softw. Impacts.
[22] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.
[23] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[24] Yee Whye Teh,et al. Neural probabilistic motor primitives for humanoid control , 2018, ICLR.
[25] Csaba Szepesvári,et al. Model Selection in Reinforcement Learning , 2011, Machine Learning.
[26] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[27] Justin Fu,et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.
[28] Nando de Freitas,et al. RL Unplugged: Benchmarks for Offline Reinforcement Learning , 2020, ArXiv.
[29] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[30] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[31] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[32] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[33] Bo Dai,et al. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.
[34] Sergey Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[35] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..
[36] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[37] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.
[38] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.
[39] Martin A. Riedmiller,et al. Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.
[40] Martin A. Riedmiller,et al. Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning , 2020, ICLR.
[41] Yisong Yue,et al. Batch Policy Learning under Constraints , 2019, ICML.
[42] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[43] Oleg O. Sushkov,et al. A Framework for Data-Driven Robotics , 2019, ArXiv.
[44] Chris Dyer,et al. On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.
[45] John N. Tsitsiklis,et al. Bias and variance in value function estimation , 2004, ICML.
[46] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[47] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.
[48] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[49] Nando de Freitas,et al. Critic Regularized Regression , 2020, NeurIPS.