暂无分享,去创建一个
Le Song | Bo Dai | Lihong Li | Lin Xiao | Jianshu Chen | Niao He | Albert Shaw | Lihong Li | Jianshu Chen | Le Song | Lin Xiao | Niao He | Bo Dai | Albert Eaton Shaw
[1] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[2] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[3] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.
[4] Lihong Li,et al. Stochastic Variance Reduction Methods for Policy Evaluation , 2017, ICML.
[5] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[6] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[7] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[8] Kavosh Asadi,et al. An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.
[9] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[10] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[11] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[12] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[13] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[14] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[15] Ali H. Sayed,et al. Distributed Policy Evaluation Under Multiple Behavior Strategies , 2013, IEEE Transactions on Automatic Control.
[16] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .
[17] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[18] Dirk Ormoneit,et al. Kernel-Based Reinforcement Learning , 2017, Encyclopedia of Machine Learning and Data Mining.
[19] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[20] David Haussler,et al. Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.
[21] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[22] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[23] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[24] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[25] Dale Schuurmans,et al. Trust-PCL: An Off-Policy Trust Region Method for Continuous Control , 2017, ICLR.
[26] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[27] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[28] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.
[29] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[30] Mengdi Wang,et al. Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear Running Time , 2017, ArXiv.
[31] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[32] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[33] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[34] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[35] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[36] Xiaohong Chen,et al. MIXING AND MOMENT PROPERTIES OF VARIOUS GARCH AND STOCHASTIC VOLATILITY MODELS , 2002, Econometric Theory.
[37] Yang Liu,et al. Stein Variational Policy Gradient , 2017, UAI.
[38] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[39] Kavosh Asadi,et al. A New Softmax Operator for Reinforcement Learning , 2016, ArXiv.
[40] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[41] Naftali Tishby,et al. Trading Value and Information in MDPs , 2012 .
[42] Mengdi Wang,et al. Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning , 2016, ArXiv.
[43] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[44] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[45] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[46] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..
[47] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[48] Le Song,et al. Learning from Conditional Distributions via Dual Kernel Embeddings , 2016, ArXiv.
[49] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[50] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[51] Francis R. Bach,et al. Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..
[52] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[53] V. Borkar. Stochastic approximation with two time scales , 1997 .
[54] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[55] Alex M. Andrew,et al. Reinforcement Learning: : An Introduction , 1998 .
[56] Bo Liu,et al. Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces , 2014, ArXiv.
[57] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[58] Le Song,et al. Learning from Conditional Distributions via Dual Embeddings , 2016, AISTATS.
[59] Benjamin Van Roy,et al. On the existence of fixed points for approximate value iteration and temporal-difference learning , 2000 .
[60] Saeed Ghadimi,et al. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.
[61] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.
[62] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[63] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.
[64] Bo Liu. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2019 .
[65] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[66] Le Song,et al. Boosting the Actor with Dual Critic , 2017, ICLR.
[67] Le Song,et al. Scalable Kernel Methods via Doubly Stochastic Gradients , 2014, NIPS.
[68] Emanuel Todorov,et al. Linearly-solvable Markov decision problems , 2006, NIPS.