The Phenomenon of Policy Churn
暂无分享,去创建一个
[1] Ameet Talwalkar,et al. Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability , 2021, ICLR.
[2] R. Sarpong,et al. Bio-inspired synthesis of xishacorenes A, B, and C, and a new congener from fuscol† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc02572c , 2019, Chemical science.
[3] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[4] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[5] Georg Ostrovski,et al. The Difficulty of Passive Learning in Deep Reinforcement Learning , 2021, NeurIPS.
[6] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[7] Paul Wagner,et al. Policy oscillation is overshooting , 2014, Neural Networks.
[8] Michael J. Goard,et al. Stimulus-dependent representational drift in primary visual cortex , 2020, Nature Communications.
[9] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[10] Tor Lattimore,et al. Behaviour Suite for Reinforcement Learning , 2019, ICLR.
[11] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[12] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[13] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[14] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[15] Yuanzhi Li,et al. An Alternative View: When Does SGD Escape Local Minima? , 2018, ICML.
[16] A. Gray,et al. I. THE ORIGIN OF SPECIES BY MEANS OF NATURAL SELECTION , 1963 .
[17] Neil Genzlinger. A. and Q , 2006 .
[18] Wenlong Fu,et al. Model-based reinforcement learning: A survey , 2018 .
[19] Tom Schaul,et al. Adapting Behaviour for Learning Progress , 2019, ArXiv.
[20] Kevin Barraclough,et al. I and i , 2001, BMJ : British Medical Journal.
[21] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.
[22] W. Hager,et al. and s , 2019, Shallow Water Hydraulics.
[23] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[24] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[25] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.
[26] Mark J. Nelson. Estimates for the Branching Factors of Atari Games , 2021, 2021 IEEE Conference on Games (CoG).
[27] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[28] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[29] G. Mongillo,et al. Intrinsic volatility of synaptic connections — a challenge to the synaptic trace theory of memory , 2017, Current Opinion in Neurobiology.
[30] Alon Rubin,et al. Representational drift in the mouse visual cortex , 2020, Current Biology.
[31] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[32] Tom Schaul,et al. Return-based Scaling: Yet Another Normalisation Trick for Deep RL , 2021, ArXiv.
[33] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[34] Wulfram Gerstner,et al. Tag-Trigger-Consolidation: A Model of Early and Late Long-Term-Potentiation and Depression , 2008, PLoS Comput. Biol..
[35] R. Munos,et al. Revisiting Peng's Q(λ) for Modern Reinforcement Learning , 2021, ICML.
[36] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[37] Georg Ostrovski,et al. Temporally-Extended ε-Greedy Exploration , 2020, ICLR.
[38] T. Schaul,et al. When should agents explore? , 2021, ICLR.
[39] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[40] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[41] Richard Axel,et al. Representational drift in primary olfactory cortex , 2020, Nature.
[42] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[43] Marc G. Bellemare,et al. Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.