暂无分享,去创建一个
[1] Francesco Orabona. A Modern Introduction to Online Learning , 2019, ArXiv.
[2] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.
[3] Jun-Kun Wang,et al. On Frank-Wolfe and Equilibrium Computation , 2017, NIPS.
[4] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[5] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2019, ICML.
[6] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[7] Yufeng Zhang,et al. Generative Adversarial Imitation Learning with Neural Networks: Global Optimality and Convergence Rate , 2020, ArXiv.
[8] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[9] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..
[10] Lin F. Yang,et al. Toward the Fundamental Limits of Imitation Learning , 2020, NeurIPS.
[11] Alexandros Kalousis,et al. Sample-Efficient Imitation Learning via Generative Adversarial Nets , 2018, AISTATS.
[12] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[13] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.
[14] Robert E. Schapire,et al. A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.
[15] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[16] Tuo Zhao,et al. On Computation and Generalization of Generative Adversarial Imitation Learning , 2020, ICLR.
[17] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..
[18] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[19] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[20] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[21] Stefan Schaal,et al. Learning from Demonstration , 1996, NIPS.
[22] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[23] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.
[24] Shie Mannor,et al. Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs , 2020, AAAI.
[25] Mohammad Ghavamzadeh,et al. Mirror Descent Policy Optimization , 2020, ArXiv.
[26] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[27] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[28] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[29] Matthieu Geist,et al. Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search , 2014, ECML/PKDD.
[30] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[31] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[32] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[33] Haim Kaplan,et al. Apprenticeship Learning via Frank-Wolfe , 2019, AAAI.
[34] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..
[35] Stefano Ermon,et al. Model-Free Imitation Learning with Policy Optimization , 2016, ICML.
[36] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[37] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[38] Y. Freund,et al. Adaptive game playing using multiplicative weights , 1999 .
[39] S. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[40] E. Ordentlich,et al. Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .
[41] Shie Mannor,et al. Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies , 2019, NeurIPS.
[42] Shie Mannor,et al. Optimistic Policy Optimization with Bandit Feedback , 2020, ICML.
[43] Michael H. Bowling,et al. Apprenticeship learning using linear programming , 2008, ICML '08.
[44] Huang Xiao,et al. Wasserstein Adversarial Imitation Learning , 2019, ArXiv.
[45] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[46] Zhiheng Li,et al. Wasserstein Distance guided Adversarial Imitation Learning with Reward Shape Exploration , 2020, 2020 IEEE 9th Data Driven Control and Learning Systems Conference (DDCLS).