Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems
暂无分享,去创建一个
Martin J. Wainwright | Dhruv Malik | Peter L. Bartlett | Koulik Khamaru | Ashwin Pananjady | Kush Bhatia | P. Bartlett | M. Wainwright | Dhruv Malik | A. Pananjady | K. Bhatia | K. Khamaru
[1] R. E. Kalman,et al. Contributions to the Theory of Optimal Control , 1960 .
[2] Boris Polyak. Gradient methods for solving equations and inequalities , 1964 .
[3] F. T. Wright,et al. A Bound on Tail Probabilities for Quadratic Forms in Independent Random Variables , 1971 .
[4] F. T. Wright. A Bound on Tail Probabilities for Quadratic Forms in Independent Random Variables Whose Distributions are not Necessarily Symmetric , 1973 .
[5] R. Durrett. Probability: Theory and Examples , 1993 .
[6] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[7] Peter Whittle,et al. Optimal Control: Basics and Beyond , 1996 .
[8] Claude-Nicolas Fiechter,et al. PAC adaptive control of linear systems , 1997, COLT '97.
[9] T. Başar. Contributions to the Theory of Optimal Control , 2001 .
[10] Tim Hesterberg,et al. Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.
[11] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.
[12] Eli Upfal,et al. Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .
[13] James C. Spall,et al. Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .
[14] Lin Xiao,et al. Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.
[15] Carl E. Rasmussen,et al. Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.
[16] Sham M. Kakade,et al. A tail inequality for quadratic forms of subgaussian random vectors , 2011, ArXiv.
[17] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.
[18] Robert D. Nowak,et al. Query Complexity of Derivative-Free Optimization , 2012, NIPS.
[19] Adel Javanmard,et al. Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems , 2012, NIPS.
[20] Biao Huang,et al. System Identification , 2000, Control Theory for Physicists.
[21] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[22] Zhengtao Ding. Adaptive control of linear systems , 2013 .
[23] Ohad Shamir,et al. On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization , 2012, COLT.
[24] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[25] Martin J. Wainwright,et al. Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations , 2013, IEEE Transactions on Information Theory.
[26] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[27] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[28] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[29] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[30] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[31] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[32] Ambuj Tewari,et al. Finite Time Analysis of Optimal Adaptive Policies for Linear-Quadratic Systems , 2017, ArXiv.
[33] Yurii Nesterov,et al. Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.
[34] Ohad Shamir,et al. An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback , 2015, J. Mach. Learn. Res..
[35] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[36] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[37] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[38] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[39] Shipra Agrawal,et al. Optimistic posterior sampling for reinforcement learning: worst-case regret bounds , 2022, NIPS.
[40] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.
[41] Yan Shuo Tan,et al. Phase Retrieval via Randomized Kaczmarz: Theoretical Guarantees , 2017, ArXiv.
[42] Krishnakumar Balasubramanian,et al. Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates , 2018, NeurIPS.
[43] Benjamin Recht,et al. Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator , 2017, ICML.
[44] Alessandro Lazaric,et al. Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems , 2018, ICML.
[45] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.
[46] Nikolai Matni,et al. Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.
[47] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.
[48] Sivaraman Balakrishnan,et al. Stochastic Zeroth-order Optimization in High Dimensions , 2017, AISTATS.
[49] Avinatan Hassidim,et al. Online Linear Quadratic Control , 2018, ICML.
[50] Nevena Lazic,et al. Regret Bounds for Model-Free Linear Quadratic Control , 2018, ArXiv.
[51] Sivaraman Balakrishnan,et al. Optimization of Smooth Functions With Noisy Observations: Local Minimax Rates , 2018, IEEE Transactions on Information Theory.
[52] Yishay Mansour,et al. Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret , 2019, ICML.
[53] Benjamin Recht,et al. The Gap Between Model-Based and Model-Free Methods on the Linear Quadratic Regulator: An Asymptotic Viewpoint , 2018, COLT.
[54] Nevena Lazic,et al. Model-Free Linear Quadratic Control via Reduction to Expert Prediction , 2018, AISTATS.
[55] Michael I. Jordan,et al. A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm , 2019, ArXiv.
[56] Nikolai Matni,et al. On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.