Provably Efficient Algorithms for Multi-Objective Competitive RL
暂无分享,去创建一个
Suvrit Sra | Tiancheng Yu | Jingzhao Zhang | Yi Tian | S. Sra | J. Zhang | Tiancheng Yu | Yi Tian
[1] Benjamin Van Roy,et al. Near-optimal Reinforcement Learning in Factored MDPs , 2014, NIPS.
[2] Ness B. Shroff,et al. Learning in Markov Decision Processes under Constraints , 2020, ArXiv.
[3] Max Simchowitz,et al. Constrained episodic reinforcement learning in concave-convex and knapsack settings , 2020, NeurIPS.
[4] Chi Jin,et al. Near-Optimal Reinforcement Learning with Self-Play , 2020, NeurIPS.
[5] Suvrit Sra,et al. Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes , 2020, NeurIPS.
[6] Shie Mannor,et al. Exploration-Exploitation in Constrained MDPs , 2020, ArXiv.
[7] Michal Valko,et al. Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited , 2021, ALT.
[8] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[9] David Simchi-Levi,et al. Non-Stationary Reinforcement Learning: The Blessing of (More) Optimism , 2019 .
[10] Tiancheng Yu,et al. Provably Efficient Online Agnostic Learning in Markov Games , 2020, ArXiv.
[11] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.
[12] 丸山 徹. Convex Analysisの二,三の進展について , 1977 .
[13] D. Blackwell. An analog of the minimax theorem for vector payoffs. , 1956 .
[14] Xiaohan Wei,et al. Provably Efficient Safe Exploration via Primal-Dual Policy Optimization , 2021, AISTATS.
[15] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[16] Qinghua Liu,et al. A Sharp Analysis of Model-based Reinforcement Learning with Self-Play , 2020, ICML.
[17] Chi Jin,et al. Provable Self-Play Algorithms for Competitive Reinforcement Learning , 2020, ICML.
[18] J. Neumann. Zur Theorie der Gesellschaftsspiele , 1928 .
[19] Peter L. Bartlett,et al. Blackwell Approachability and No-Regret Learning are Equivalent , 2010, COLT.
[20] Lihong Li,et al. Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL , 2020, ICLR.
[21] Aleksandrs Slivkins,et al. Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.
[22] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[23] Shie Mannor,et al. Approachability in unknown games: Online learning meets multi-objective optimization , 2014, COLT.
[24] Jianjun Yuan,et al. Online Convex Optimization for Cumulative Constraints , 2018, NeurIPS.
[25] Nikhil R. Devanur,et al. Bandits with concave rewards and convex knapsacks , 2014, EC.
[26] Akshay Krishnamurthy,et al. Reward-Free Exploration for Reinforcement Learning , 2020, ICML.
[27] Cédric Archambeau,et al. Adaptive Algorithms for Online Convex Optimization with Long-term Constraints , 2015, ICML.
[28] Nahum Shimkin,et al. An Online Convex Optimization Approach to Blackwell's Approachability , 2015, J. Mach. Learn. Res..
[29] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[30] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[31] Qiaomin Xie,et al. Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium , 2020, COLT 2020.
[32] Lin F. Yang,et al. Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning , 2020, NeurIPS.
[33] Miroslav Dudík,et al. Reinforcement Learning with Convex Constraints , 2019, NeurIPS.