暂无分享,去创建一个
[1] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[2] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[3] Nicolò Cesa-Bianchi,et al. Bandits With Heavy Tail , 2012, IEEE Transactions on Information Theory.
[4] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[5] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[6] D. Lindley. Kendall's Advanced Theory of Statistics, volume 2B, Bayesian Inference, 2nd edn , 2005 .
[7] Shie Mannor,et al. Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning , 2018, NeurIPS.
[8] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[9] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[10] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[11] Stephen P. Boyd,et al. CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..
[12] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[13] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[14] Ye Zhang,et al. A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification , 2015, IJCNLP.
[15] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[16] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[17] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[18] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[19] Shie Mannor,et al. Visualizing Dynamics: from t-SNE to SEMI-MDPs , 2016, ArXiv.
[20] Kamyar Azizzadenesheli,et al. Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).
[21] Shie Mannor,et al. Is a picture worth a thousand words? A Deep Multi-Modal Fusion Architecture for Product Classification in e-commerce , 2016, AAAI 2016.
[22] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.
[23] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[24] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..
[25] Max Welling,et al. Distributed Inference for Latent Dirichlet Allocation , 2007, NIPS.
[26] Shie Mannor,et al. Graying the black box: Understanding DQNs , 2016, ICML.
[27] Toniann Pitassi,et al. The reusable holdout: Preserving validity in adaptive data analysis , 2015, Science.
[28] Aurélien Garivier,et al. Parametric Bandits: The Generalized Linear Case , 2010, NIPS.
[29] Shie Mannor,et al. Thompson Sampling for Complex Online Problems , 2013, ICML.
[30] Shie Mannor,et al. Shallow Updates for Deep Reinforcement Learning , 2017, NIPS.
[31] Ian Osband,et al. The Uncertainty Bellman Equation and Exploration , 2017, ICML.
[32] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[33] Alessandro Lazaric,et al. Linear Thompson Sampling Revisited , 2016, AISTATS.
[34] Jasper Snoek,et al. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.