暂无分享,去创建一个
Laurent Orseau | Tor Lattimore | Marcus Hutter | Jan Leike | Tor Lattimore | Marcus Hutter | Laurent Orseau | J. Leike
[1] Marcus Hutter,et al. Self-Optimizing and Pareto-Optimal Policies in General Environments based on Bayes-Mixtures , 2002, COLT.
[2] Laurent Orseau,et al. Asymptotic non-learnability of universal agents with computable horizon functions , 2013, Theor. Comput. Sci..
[3] R. Lathe. Phd by thesis , 1988, Nature.
[4] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[5] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[6] Jordan Stoyanov,et al. Counterexamples in Probability , 1989 .
[7] Benjamin Van Roy,et al. Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.
[8] R. Durrett. Probability: Theory and Examples , 1993 .
[9] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[10] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[11] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[12] D. Blackwell,et al. Merging of Opinions with Increasing Information , 1962 .
[13] Tor Lattimore,et al. General time consistent discounting , 2014, Theor. Comput. Sci..
[14] Phuong Nguyen,et al. Competing with an Infinite Set of Models in Reinforcement Learning , 2013, AISTATS.
[15] Shane Legg,et al. Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.
[16] Marcus Hutter,et al. Discrete MDL Predicts in Total Variation , 2009, NIPS.
[17] Marcus Hutter,et al. Bad Universal Priors and Notions of Optimality , 2015, COLT.
[18] Shie Mannor,et al. Thompson Sampling for Learning Parameterized Markov Decision Processes , 2014, COLT.
[19] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[20] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.
[21] Tor Lattimore,et al. Asymptotically Optimal Agents , 2011, ALT.
[22] Jordan Stoyanov,et al. Counterexamples in Probability. , 1989 .
[23] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[24] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[25] Tor Lattimore,et al. Theory of general reinforcement learning , 2014 .
[26] K. Pearson. Biometrika , 1902, The American Naturalist.
[27] Marcus Hutter,et al. Rationality, optimism and guarantees in general reinforcement learning , 2015, J. Mach. Learn. Res..
[28] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[29] Daniel A. Braun,et al. A Minimum Relative Entropy Principle for Learning and Acting , 2008, J. Artif. Intell. Res..
[30] Laurent Orseau,et al. Universal Knowledge-Seeking Agents for Stochastic Environments , 2013, ALT.
[31] Marcus Hutter,et al. A Theory of Universal Artificial Intelligence based on Algorithmic Complexity , 2000, ArXiv.
[32] Marcus Hutter. General Discounting Versus Average Reward , 2006, ALT.
[33] Marcus Hutter,et al. Universal Artificial Intellegence - Sequential Decisions Based on Algorithmic Probability , 2005, Texts in Theoretical Computer Science. An EATCS Series.
[34] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.