Model-Value Inconsistency as a Signal for Epistemic Uncertainty
暂无分享,去创建一个
Feryal M. P. Behbahani | T. Schaul | Simon Osindero | André Barreto | Angelos Filos | Gregory Farquhar | Diana Borsa | A. Friesen | Zita Marinho | Eszter V'ertes
[1] Rishabh Agarwal,et al. Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation , 2021, AAAI.
[2] Zita Marinho,et al. Self-Consistent Models and Values , 2021, NeurIPS.
[3] Tao Yu,et al. PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning , 2021, NeurIPS.
[4] Ivo Danihelka,et al. Muesli: Combining Improvements in Policy Optimization , 2021, ICML.
[5] Zheng Wen,et al. Reinforcement Learning, Bit by Bit , 2021, Found. Trends Mach. Learn..
[6] Clare Lyle,et al. On The Effect of Auxiliary Tasks on Representation Dynamics , 2021, AISTATS.
[7] S. Levine,et al. PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning , 2021, ICML.
[8] Marc G. Bellemare,et al. The Value-Improvement Path: Towards Better Representations for Reinforcement Learning , 2020, AAAI.
[9] Satinder Singh,et al. The Value Equivalence Principle for Model-Based Reinforcement Learning , 2020, NeurIPS.
[10] Jane X. Wang,et al. Temporal Difference Uncertainties as a Signal for Exploration , 2020, ArXiv.
[11] Xiao Ma,et al. Contrastive Variational Model-Based Reinforcement Learning for Complex Observations , 2020, ArXiv.
[12] Sergey Levine,et al. Can Autonomous Vehicles Identify, Recover From, and Adapt to Distribution Shifts? , 2020, ICML.
[13] Jasper Snoek,et al. Hyperparameter Ensembles for Robustness and Uncertainty Quantification , 2020, NeurIPS.
[14] Yuval Tassa,et al. dm_control: Software and Tasks for Continuous Control , 2020, Softw. Impacts.
[15] José Miguel Hernández-Lobato,et al. Depth Uncertainty in Neural Networks , 2020, NeurIPS.
[16] Michael W. Dusenberry,et al. Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors , 2020, ICML.
[17] Pieter Abbeel,et al. Planning to Explore via Self-Supervised World Models , 2020, ICML.
[18] Tim Rocktäschel,et al. RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments , 2020, ICLR.
[19] Pavel Izmailov,et al. Bayesian Deep Learning and a Probabilistic Perspective of Generalization , 2020, NeurIPS.
[20] Fabio Viola,et al. Value-driven Hindsight Modelling , 2020, NeurIPS.
[21] Krzysztof Choromanski,et al. Ready Policy One: World Building Through Active Learning , 2020, ICML.
[22] Jimmy Ba,et al. Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.
[23] J. Schulman,et al. Leveraging Procedural Generation to Benchmark Reinforcement Learning , 2019, ICML.
[24] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[25] Matteo Hessel,et al. Off-Policy Actor-Critic with Shared Experience Replay , 2019, ICML.
[26] Rishabh Agarwal,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2019, ICML.
[27] Sergey Levine,et al. Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model , 2019, NeurIPS.
[28] Mohamed H. Zaki,et al. Uncertainty in Neural Networks: Approximately Bayesian Ensembling , 2018, AISTATS.
[29] Martin A. Riedmiller,et al. Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models , 2019, CoRL.
[30] Yarin Gal,et al. Generalizing from a few environments in safety-critical reinforcement learning , 2019, ArXiv.
[31] Aaron van den Oord,et al. Shaping Belief States with Generative Environment Models for RL , 2019, NeurIPS.
[32] Deepak Pathak,et al. Self-Supervised Exploration via Disagreement , 2019, ICML.
[33] Tian Tian,et al. MinAtar: An Atari-Inspired Testbed for Thorough and Reproducible Reinforcement Learning Experiments , 2019 .
[34] Yoshua Bengio,et al. Hyperbolic Discounting and Learning over Multiple Horizons , 2019, ArXiv.
[35] Andrew Gordon Wilson,et al. A Simple Baseline for Bayesian Uncertainty in Deep Learning , 2019, NeurIPS.
[36] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.
[37] Wojciech Jaskowski,et al. Model-Based Active Exploration , 2018, ICML.
[38] Sham M. Kakade,et al. Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control , 2018, ICLR.
[39] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[40] Myle Ott,et al. Understanding Back-Translation at Scale , 2018, EMNLP.
[41] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[42] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[43] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[44] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.
[45] Shimon Whiteson,et al. TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning , 2017, ICLR.
[46] Michael I. Jordan,et al. Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.
[47] Gabriel Kalweit,et al. Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning , 2017, CoRL.
[48] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[49] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[50] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[51] Kilian Q. Weinberger,et al. Snapshot Ensembles: Train 1, get M for free , 2017, ICLR.
[52] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[53] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.
[54] Nahum Shimkin,et al. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.
[55] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[56] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[57] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[58] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[59] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[60] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[61] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.
[62] J. Schreiber. Foundations Of Statistics , 2016 .
[63] C. Rasmussen,et al. Improving PILCO with Bayesian Neural Network Dynamics Models , 2016 .
[64] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.
[65] Julien Cornebise,et al. Weight Uncertainty in Neural Network , 2015, ICML.
[66] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[67] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.
[68] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[69] Friedhelm Schwenker,et al. Neural Network Ensembles in Reinforcement Learning , 2013, Neural Processing Letters.
[70] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[71] Pierre-Yves Oudeyer,et al. Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress , 2012, NIPS.
[72] Ondrej Bojar,et al. Improving Translation Model by Monolingual Data , 2011, WMT@EMNLP.
[73] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[74] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.
[75] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[76] John D. Hunter,et al. Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.
[77] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[78] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[79] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[80] Marcus Hutter. Simulation Algorithms for Computational Systems Biology , 2017, Texts in Theoretical Computer Science. An EATCS Series.
[81] Jörg D. Wichard,et al. Building Ensembles with Heterogeneous Models , 2003 .
[82] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[83] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[84] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[85] Robert Tibshirani,et al. A Comparison of Some Error Estimates for Neural Network Models , 1996, Neural Computation.
[86] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[87] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[88] Jürgen Schmidhuber,et al. An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.
[89] Manfred Morari,et al. Model predictive control: Theory and practice - A survey , 1989, Autom..
[90] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .
[91] Christos H. Papadimitriou,et al. Games against nature , 1985, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).
[92] R. Bellman. A Markovian Decision Process , 1957 .