Active Offline Policy Selection
暂无分享,去创建一个
Nando de Freitas | T. Paine | Caglar Gulcehre | N. D. Freitas | D. Mankowitz | Cosmin Paduraru | Misha Denil | Yutian Chen | Ksenia Konyushkova
[1] Raia Hadsell,et al. Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes , 2021, CoRL.
[2] Mohammad Norouzi,et al. Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization , 2021, ICLR.
[3] Sergey Levine,et al. Benchmarks for Deep Off-Policy Evaluation , 2021, ICLR.
[4] Razvan Pascanu,et al. Regularized Behavior Value Estimation , 2021, ArXiv.
[5] Albin Cassirer,et al. Reverb: A Framework For Experience Replay , 2021, ArXiv.
[6] Bo Dai,et al. Offline Policy Selection under Uncertainty , 2020, AISTATS.
[7] John Salvatier,et al. Active Reinforcement Learning: Observing Rewards at a Cost , 2020, ArXiv.
[8] Nando de Freitas,et al. Hyperparameter Selection for Offline Reinforcement Learning , 2020, ArXiv.
[9] Lihong Li,et al. Off-Policy Evaluation via the Regularized Lagrangian , 2020, NeurIPS.
[10] Nando de Freitas,et al. Critic Regularized Regression , 2020, NeurIPS.
[11] Yuval Tassa,et al. dm_control: Software and Tasks for Continuous Control , 2020, Softw. Impacts.
[12] S. Levine,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.
[13] Sergio Gomez Colmenarejo,et al. Acme: A Research Framework for Distributed Reinforcement Learning , 2020, ArXiv.
[14] Lantao Yu,et al. MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.
[15] T. Joachims,et al. MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.
[16] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[17] Quoc V. Le,et al. Chip Placement with Deep Reinforcement Learning , 2020, ArXiv.
[18] Justin Fu,et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.
[19] Tom Schaul,et al. Policy Evaluation Networks , 2020, ArXiv.
[20] Martin A. Riedmiller,et al. Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning , 2020, ICLR.
[21] Krzysztof Choromanski,et al. Ready Policy One: World Building Through Active Learning , 2020, ICML.
[22] K. Choromanski,et al. Effective Diversity in Population-Based Reinforcement Learning , 2020, NeurIPS.
[23] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[24] Hoang Minh Le,et al. Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning , 2019, NeurIPS Datasets and Benchmarks.
[25] Masatoshi Uehara,et al. Minimax Weight and Q-Function Learning for Off-Policy Evaluation , 2019, ICML.
[26] Oleg O. Sushkov,et al. Scaling data-driven robotics with reward sketching and batch reinforcement learning , 2019, Robotics: Science and Systems.
[27] Rishabh Agarwal,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2019, ICML.
[28] Bo Dai,et al. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.
[29] Xinkun Nie,et al. Learning When-to-Treat Policies , 2019, Journal of the American Statistical Association.
[30] Gabriel Dulac-Arnold,et al. Challenges of Real-World Reinforcement Learning , 2019, ArXiv.
[31] Yisong Yue,et al. Batch Policy Learning under Constraints , 2019, ICML.
[32] Nando de Freitas,et al. Bayesian Optimization in AlphaGo , 2018, ArXiv.
[33] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[34] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[35] Pascal Fua,et al. Discovering General-Purpose Active Learning Strategies , 2018, ArXiv.
[36] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[37] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.
[38] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[39] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[40] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[41] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[42] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[43] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[44] Philip S. Thomas,et al. Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees , 2015, IJCAI.
[45] Alkis Gotovos,et al. Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.
[46] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[47] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[48] Oliver Kroemer,et al. Active Reward Learning , 2014, Robotics: Science and Systems.
[49] Nando de Freitas,et al. On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning , 2014, AISTATS.
[50] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[51] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.
[52] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[53] Ali Jalali,et al. Hybrid Batch Bayesian Optimization , 2012, ICML.
[54] Andreas Krause,et al. Contextual Gaussian Process Bandit Optimization , 2011, NIPS.
[55] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[56] Kevin Leyton-Brown,et al. Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.
[57] Nando de Freitas,et al. A Bayesian interactive optimization approach to procedural animation design , 2010, SCA '10.
[58] R. Munos,et al. Best Arm Identification in Multi-Armed Bandits , 2010, COLT 2010.
[59] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[60] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[61] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[62] Nando de Freitas,et al. A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot , 2009, Auton. Robots.
[63] Masashi Sugiyama,et al. Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning , 2009, IJCAI.
[64] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[65] Gerald DeJong,et al. Active reinforcement learning , 2008, ICML '08.
[66] Nando de Freitas,et al. Active Preference Learning with Discrete Choice Data , 2007, NIPS.
[67] Tao Wang,et al. Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.
[68] Christopher K. I. Williams,et al. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .
[69] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[70] Sergio Gomez Colmenarejo,et al. RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning , 2020 .
[71] Nando de Freitas,et al. Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.
[72] Andrea Bonarini,et al. Fitted Policy Search: Direct Policy Search using a Batch Reinforcement Learning Approach , 2010 .