暂无分享,去创建一个
[1] T. Power. Play and Exploration in Children and Animals , 1999 .
[2] Xia Hu,et al. Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments , 2021, ICLR.
[3] Charles Blundell,et al. Coverage as a Principle for Discovering Transferable Behavior in Reinforcement Learning , 2021, ArXiv.
[4] Shie Mannor,et al. Adaptive Skills Adaptive Partitions (ASAP) , 2016, NIPS.
[5] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[6] Michael Buro,et al. Build Order Optimization in StarCraft , 2011, AIIDE.
[7] Shie Mannor,et al. A Bayesian Approach to Robust Reinforcement Learning , 2019, UAI.
[8] S. Nelson,et al. Homeostatic plasticity in the developing nervous system , 2004, Nature Reviews Neuroscience.
[9] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[10] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[11] C. Breazeal,et al. Experiments in socially guided exploration: lessons learned in building robots that learn with and without human teachers , 2008, Connect. Sci..
[12] John Langford,et al. Efficient Exploration in Reinforcement Learning , 2010, Encyclopedia of Machine Learning.
[13] R Becket Ebitz,et al. Tonic exploration governs both flexibility and lapses , 2019, PLoS Comput. Biol..
[14] J. Peters,et al. Dopaminergic modulation of the exploration/exploitation trade-off in human decision-making , 2019, bioRxiv.
[15] Marlos C. Machado,et al. Exploration in Reinforcement Learning with Deep Covering Options , 2020, ICLR.
[16] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.
[17] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[18] Tom Schaul,et al. Adapting Behaviour for Learning Progress , 2019, ArXiv.
[19] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.
[20] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[21] George Konidaris,et al. Discovering Options for Exploration by Minimizing Cover Time , 2019, ICML.
[22] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[23] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.
[24] Aldo Pacchiano,et al. Deep Reinforcement Learning with Dynamic Optimism , 2021, ArXiv.
[25] Andrew R. Mitz,et al. Subcortical Substrates of Explore-Exploit Decisions in Primates , 2019, Neuron.
[26] Frederic Bartumeus,et al. Bumblebees learn foraging routes through exploitation–exploration cycles , 2019, Journal of the Royal Society Interface.
[27] Angela J. Yu,et al. Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.
[28] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[29] Tom Schaul,et al. Universal Successor Features Approximators , 2018, ICLR.
[30] Rahul Bhui,et al. Structured, uncertainty-driven exploration in real-world consumer choice , 2019, Proceedings of the National Academy of Sciences.
[31] Michel Tokic. Adaptive ε-greedy Exploration in Reinforcement Learning Based on Value Differences , 2010 .
[32] Pierre-Yves Oudeyer,et al. What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.
[33] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.
[34] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[35] Ryutaro Ichise,et al. Fast and slow curiosity for high-level exploration in reinforcement learning , 2020, Appl. Intell..
[36] R. Munos,et al. Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.
[37] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[38] Doina Precup,et al. The Option Keyboard: Combining Skills in Reinforcement Learning , 2021, NeurIPS.
[39] Samuel J. Gershman,et al. Dopaminergic genes are associated with both directed and random exploration , 2018, Neuropsychologia.
[40] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[41] Razvan Pascanu,et al. Temporal Difference Uncertainties as a Signal for Exploration , 2020, ArXiv.
[42] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[43] Filippo Radicchi,et al. Levy flights in human behavior and cognition , 2013, 1306.6533.
[44] Georg Ostrovski,et al. Temporally-Extended ε-Greedy Exploration , 2020, ICLR.
[45] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[46] Susan L. Franzel,et al. Guided search: an alternative to the feature integration model for visual search. , 1989, Journal of experimental psychology. Human perception and performance.
[47] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[48] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[49] Kenneth O. Stanley,et al. Go-Explore: a New Approach for Hard-Exploration Problems , 2019, ArXiv.
[50] Martha White,et al. Adapting Behaviour via Intrinsic Reward: A Survey and Empirical Study , 2019, ArXiv.
[51] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[52] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[53] Robert C. Wilson,et al. Differential Effects of Psychotic Illness on Directed and Random Exploration , 2020, Computational Psychiatry.
[54] Daniel Guo,et al. Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.
[55] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.
[56] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[57] Marco Wiering,et al. Ensemble Algorithms in Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[58] Steven Latré,et al. Learning Intrinsically Motivated Options to Stimulate Policy Exploration , 2020 .
[59] Thomas T. Hills,et al. Exploration versus exploitation in space, mind, and society , 2015, Trends in Cognitive Sciences.
[60] Terence Hwa,et al. Chemotaxis as a navigation strategy to boost range expansion , 2019, Nature.
[61] Anjali Raja Beharelle,et al. Increased random exploration in schizophrenia is associated with inflammation , 2020, bioRxiv.
[62] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[63] J. Downar,et al. A cortical network sensitive to stimulus salience in a neutral behavioral context across multiple sensory modalities. , 2002, Journal of neurophysiology.
[64] Chrystopher L. Nehaniv,et al. Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.
[65] S. Gershman. Deconstructing the human algorithms for exploration , 2018, Cognition.
[66] Tom Schaul,et al. Return-based Scaling: Yet Another Normalisation Trick for Deep RL , 2021, ArXiv.