暂无分享,去创建一个
Misha Denil | Nando de Freitas | Ksenia Konyushkova | Yutian Chen | Cosmin Paduraru | Caglar Gulcehre | Thomas Paine | Daniel J Mankowitz
[1] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[2] Nando de Freitas,et al. A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot , 2009, Auton. Robots.
[3] Yuval Tassa,et al. dm_control: Software and Tasks for Continuous Control , 2020, Softw. Impacts.
[4] Dominik D. Freydenberger,et al. Can We Learn to Gamble Efficiently? , 2010, COLT.
[5] Masashi Sugiyama,et al. Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning , 2009, IJCAI.
[6] Alkis Gotovos,et al. Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.
[7] Nando de Freitas,et al. On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning , 2014, AISTATS.
[8] Mohammad Norouzi,et al. Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization , 2021, ICLR.
[9] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[10] Krzysztof Choromanski,et al. Ready Policy One: World Building Through Active Learning , 2020, ICML.
[11] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[12] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.
[13] Xinkun Nie,et al. Learning When-to-Treat Policies , 2019, Journal of the American Statistical Association.
[14] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.
[15] Ali Jalali,et al. Hybrid Batch Bayesian Optimization , 2012, ICML.
[16] Sergio Gomez Colmenarejo,et al. Acme: A Research Framework for Distributed Reinforcement Learning , 2020, ArXiv.
[17] Yisong Yue,et al. Batch Policy Learning under Constraints , 2019, ICML.
[18] Lantao Yu,et al. MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.
[19] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[20] Hoang Minh Le,et al. Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning , 2019, NeurIPS Datasets and Benchmarks.
[21] Nando de Freitas,et al. Active Preference Learning with Discrete Choice Data , 2007, NIPS.
[22] Christopher K. I. Williams,et al. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .
[23] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[24] Oliver Kroemer,et al. Active Reward Learning , 2014, Robotics: Science and Systems.
[25] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[26] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[27] Misha Denil,et al. Scaling data-driven robotics with reward sketching and batch reinforcement learning , 2020, Robotics: Science and Systems.
[28] Krzysztof Choromanski,et al. Effective Diversity in Population-Based Reinforcement Learning , 2020, NeurIPS.
[29] John Salvatier,et al. Active Reinforcement Learning: Observing Rewards at a Cost , 2020, ArXiv.
[30] Martin A. Riedmiller,et al. Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning , 2020, ICLR.
[31] Thorsten Joachims,et al. MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.
[32] Albin Cassirer,et al. Reverb: A Framework For Experience Replay , 2021, ArXiv.
[33] Sergio Gomez Colmenarejo,et al. RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning , 2020 .
[34] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[35] Pascal Fua,et al. Discovering General-Purpose Active Learning Strategies , 2018, ArXiv.
[36] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.
[37] Tao Wang,et al. Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.
[38] Philip S. Thomas,et al. Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees , 2015, IJCAI.
[39] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[40] S. Levine,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.
[41] Quoc V. Le,et al. Chip Placement with Deep Reinforcement Learning , 2020, ArXiv.
[42] Andreas Krause,et al. Contextual Gaussian Process Bandit Optimization , 2011, NIPS.
[43] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[44] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[45] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[46] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[47] Kevin Leyton-Brown,et al. Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.
[48] Nando de Freitas,et al. Bayesian Optimization in AlphaGo , 2018, ArXiv.
[49] Nando de Freitas,et al. Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.
[50] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[51] Bo Dai,et al. Offline Policy Selection under Uncertainty , 2020, AISTATS.
[52] Nando de Freitas,et al. A Bayesian interactive optimization approach to procedural animation design , 2010, SCA '10.
[53] Gerald DeJong,et al. Active reinforcement learning , 2008, ICML '08.
[54] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[55] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[56] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[57] Gabriel Dulac-Arnold,et al. Challenges of Real-World Reinforcement Learning , 2019, ArXiv.
[58] Sergey Levine,et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.
[59] Andrea Bonarini,et al. Fitted Policy Search: Direct Policy Search using a Batch Reinforcement Learning Approach , 2010 .
[60] Sergey Levine,et al. Benchmarks for Deep Off-Policy Evaluation , 2021, ICLR.
[61] Tom Schaul,et al. Policy Evaluation Networks , 2020, ArXiv.
[62] Bo Dai,et al. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.
[63] Razvan Pascanu,et al. Regularized Behavior Value Estimation , 2021, ArXiv.
[64] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[65] Bo Dai,et al. Off-Policy Evaluation via the Regularized Lagrangian , 2020, NeurIPS.
[66] Masatoshi Uehara,et al. Minimax Weight and Q-Function Learning for Off-Policy Evaluation , 2019, ICML.
[67] Nando de Freitas,et al. Hyperparameter Selection for Offline Reinforcement Learning , 2020, ArXiv.
[68] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[69] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[70] Nando de Freitas,et al. Critic Regularized Regression , 2020, NeurIPS.
[71] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.