暂无分享,去创建一个
Shie Mannor | Constantine Caramanis | Jeongyeol Kwon | Yonathan Efroni | Shie Mannor | C. Caramanis | Yonathan Efroni | Jeongyeol Kwon
[1] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.
[2] Peter Buchholz,et al. Computation of weighted sums of rewards for concurrent MDPs , 2018, Math. Methods Oper. Res..
[3] Lihong Li,et al. Sample Complexity of Multi-task Reinforcement Learning , 2013, UAI.
[4] O. Cappé,et al. On‐line expectation–maximization algorithm for latent data models , 2009 .
[5] John Langford,et al. PAC Reinforcement Learning with Rich Observations , 2016, NIPS.
[6] Constantine Caramanis,et al. EM Converges for a Mixture of Many Linear Regressions , 2019, AISTATS.
[7] Brian T. Denton,et al. Multi-model Markov decision processes , 2021, IISE Trans..
[8] Nikos A. Vlassis,et al. Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..
[9] Byron Boots,et al. An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems , 2011, AAAI.
[10] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[11] S. Kakade,et al. Sample-Efficient Reinforcement Learning of Undercomplete POMDPs , 2020, NeurIPS.
[12] Emma Brunskill,et al. A PAC RL Algorithm for Episodic POMDPs , 2016, AISTATS.
[13] Nan Jiang,et al. On Oracle-Efficient PAC RL with Rich Observations , 2018, NeurIPS.
[14] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .
[15] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.
[16] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[17] Nan Jiang,et al. Markov Decision Processes with Continuous Side Information , 2017, ALT.
[18] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[19] Yao Liu,et al. PAC Continuous State Online Multitask Reinforcement Learning with Identification , 2016, AAMAS.
[20] Nan Jiang,et al. Improving Predictive State Representations via Gradient Descent , 2016, AAAI.
[21] Masoumeh T. Izadi,et al. Sensitivity Analysis of POMDP Value Functions , 2009, 2009 International Conference on Machine Learning and Applications.
[22] Aurélien Garivier,et al. Explore First, Exploit Next: The True Shape of Regret in Bandit Problems , 2016, Math. Oper. Res..
[23] Olivier Buffet,et al. MOMDPs: A Solution for Modelling Adaptive Management Problems , 2012, AAAI.
[24] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[25] Anima Anandkumar,et al. A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.
[26] Michael R. James,et al. Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.
[27] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[28] Shie Mannor,et al. Latent Bandits , 2014, ICML.
[29] Constantine Caramanis,et al. On the Minimax Optimality of the EM Algorithm for Learning Two-Component Mixed Linear Regression , 2020, AISTATS.
[30] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.
[31] Shie Mannor,et al. Contextual Markov Decision Processes , 2015, ArXiv.
[32] Hongsheng Xi,et al. Finding optimal memoryless policies of POMDPs under the expected average reward criterion , 2011, Eur. J. Oper. Res..
[33] Kamyar Azizzadenesheli,et al. Reinforcement Learning of POMDPs using Spectral Methods , 2016, COLT.
[34] Geoffrey J. Gordon,et al. Supervised Learning for Dynamical System Learning , 2015, NIPS.
[35] Joelle Pineau,et al. Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..
[36] Anima Anandkumar,et al. Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..
[37] Byron Boots,et al. Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..
[38] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[39] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[40] Sham M. Kakade,et al. A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..
[41] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[42] Nan Jiang,et al. Provably efficient RL with Rich Observations via Latent State Decoding , 2019, ICML.
[43] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..
[44] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[45] Shuai Li,et al. Online Clustering of Bandits , 2014, ICML.
[46] Constantine Caramanis,et al. The EM Algorithm gives Sample-Optimality for Learning Mixtures of Well-Separated Gaussians , 2020, COLT 2020.
[47] V. N. Bogaevski,et al. Matrix Perturbation Theory , 1991 .
[48] Shuai Li,et al. On Context-Dependent Clustering of Bandits , 2016, ICML.
[49] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.