暂无分享,去创建一个
[1] Peter Buchholz,et al. Computation of weighted sums of rewards for concurrent MDPs , 2018, Math. Methods Oper. Res..
[2] Aurélien Garivier,et al. Explore First, Exploit Next: The True Shape of Regret in Bandit Problems , 2016, Math. Oper. Res..
[3] W. S. Merwin. At the Same Time , 1971 .
[4] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[5] Benjamin Van Roy,et al. Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.
[6] Akshay Krishnamurthy,et al. Reward-Free Exploration for Reinforcement Learning , 2020, ICML.
[7] John Langford,et al. PAC Reinforcement Learning with Rich Observations , 2016, NIPS.
[8] Yishay Mansour,et al. Estimating a mixture of two product distributions , 1999, COLT '99.
[9] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[10] Olivier Buffet,et al. MOMDPs: A Solution for Modelling Adaptive Management Problems , 2012, AAAI.
[11] Haipeng Luo,et al. Learning Adversarial MDPs with Bandit Feedback and Unknown Transition , 2019, ArXiv.
[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[13] S. Kakade,et al. Sample-Efficient Reinforcement Learning of Undercomplete POMDPs , 2020, NeurIPS.
[14] Benjamin Van Roy,et al. Why is Posterior Sampling Better than Optimism for Reinforcement Learning? , 2016, ICML.
[15] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[16] Shie Mannor,et al. Latent Bandits , 2014, ICML.
[17] Max Simchowitz,et al. Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs , 2019, NeurIPS.
[18] Sanjoy Dasgupta,et al. Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).
[19] Ruosong Wang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2020, ICLR.
[20] Sham M. Kakade,et al. A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..
[21] Constantine Caramanis,et al. On the Minimax Optimality of the EM Algorithm for Learning Two-Component Mixed Linear Regression , 2020, AISTATS.
[22] Nan Jiang,et al. Provably efficient RL with Rich Observations via Latent State Decoding , 2019, ICML.
[23] Daniel M. Kane,et al. Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).
[24] Lihong Li,et al. Sample Complexity of Multi-task Reinforcement Learning , 2013, UAI.
[25] Constantine Caramanis,et al. Convex and Nonconvex Formulations for Mixed Regression With Two Components: Minimax Optimal Rates , 2018, IEEE Transactions on Information Theory.
[26] Prateek Jain,et al. Learning Mixtures of Discrete Product Distributions using Spectral Decompositions , 2013, COLT.
[27] Rocco A. Servedio,et al. Learning mixtures of structured distributions over discrete domains , 2012, SODA.
[28] Emma Brunskill,et al. A PAC RL Algorithm for Episodic POMDPs , 2016, AISTATS.
[29] Shie Mannor,et al. Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies , 2019, NeurIPS.
[30] Yin Tat Lee,et al. Solving linear programs in the current matrix multiplication time , 2018, STOC.
[31] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[32] Benjamin Van Roy,et al. Eluder Dimension and the Sample Complexity of Optimistic Exploration , 2013, NIPS.
[33] Nan Jiang,et al. On Oracle-Efficient PAC RL with Rich Observations , 2018, NeurIPS.
[34] Constantine Caramanis,et al. Learning Mixtures of Graphs from Epidemic Cascades , 2019, ICML.
[35] András György,et al. The adversarial stochastic shortest path problem with unknown transition probabilities , 2012, AISTATS.
[36] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[37] Robert E. Tarjan,et al. A Linear-Time Algorithm for Testing the Truth of Certain Quantified Boolean Formulas , 1979, Inf. Process. Lett..
[38] J. Feldman,et al. Learning mixtures of product distributions over discrete domains , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).
[39] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.
[40] Kamyar Azizzadenesheli,et al. Reinforcement Learning of POMDPs using Spectral Methods , 2016, COLT.
[41] Christos Dimitrakakis,et al. Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities , 2019, ArXiv.
[42] Shie Mannor,et al. Markov Decision Processes with Arbitrary Reward Processes , 2008, Math. Oper. Res..
[43] Brian T. Denton,et al. Multi-model Markov decision processes , 2021, IISE Trans..
[44] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[45] R. D. Veaux,et al. Mixtures of linear regressions , 1989 .
[46] Anders Jonsson,et al. Adaptive Reward-Free Exploration , 2020, ArXiv.
[47] Joelle Pineau,et al. Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..
[48] Martin J. Wainwright,et al. High-Dimensional Statistics , 2019 .
[49] Byron Boots,et al. Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..