暂无分享,去创建一个
Sepp Hochreiter | Lukas Gruber | Jose A. Arjona-Medina | Johannes Brandstetter | Markus Holzleitner | Jos'e Arjona-Medina | S. Hochreiter | Markus Holzleitner | Lukas Gruber | Johannes Brandstetter
[1] Harold J. Kushner,et al. wchastic. approximation methods for constrained and unconstrained systems , 1978 .
[2] Frank Fallside,et al. Dynamic reinforcement driven error propagation networks with application to game playing , 1989 .
[3] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[4] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[5] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[6] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[7] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[8] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[9] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[10] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[11] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[12] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[13] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[15] Martin Hairer,et al. Ergodic Properties of Markov Processes , 2006 .
[16] Pierre-Antoine Absil,et al. On the stable equilibrium points of gradient systems , 2006, Syst. Control. Lett..
[17] H. Robbins. A Stochastic Approximation Method , 1951 .
[18] B. Bakker,et al. Reinforcement learning by backpropagation through an LSTM model/critic , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[19] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[20] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[21] U. Rieder,et al. Markov Decision Processes , 2010 .
[22] Shalabh Bhatnagar,et al. Stochastic Recursive Algorithms for Optimization , 2012 .
[23] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[24] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[25] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[26] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[27] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[28] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.
[29] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[30] Shalabh Bhatnagar,et al. Two Timescale Stochastic Approximation with Controlled Markov noise , 2015, Math. Oper. Res..
[31] Yoshua Bengio,et al. Depth with Nonlinearity Creates No Bad Local Minima in ResNets , 2019, Neural Networks.
[32] Qi Cai,et al. Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy , 2019, ArXiv.
[33] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[34] Michael I. Jordan,et al. Minmax Optimization: Stable Limit Points of Gradient Descent Ascent are Locally Optimal , 2019, ArXiv.
[35] Leslie Pack Kaelbling,et al. Effect of Depth and Width on Local Minima in Deep Learning , 2018, Neural Computation.
[36] Jakub W. Pachocki,et al. Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.
[37] Sepp Hochreiter,et al. RUDDER: Return Decomposition for Delayed Rewards , 2018, NeurIPS.
[38] S. Shankar Sastry,et al. On Finding Local Nash Equilibria (and Only Local Nash Equilibria) in Zero-Sum Games , 2019, 1901.00838.
[39] Yongxin Chen,et al. Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost , 2019, NeurIPS.
[40] Yingbin Liang,et al. Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples , 2019, NeurIPS.
[41] Sepp Hochreiter,et al. Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution , 2020, ArXiv.
[42] Michael I. Jordan,et al. On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.
[43] Volkan Cevher,et al. On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems , 2020, NeurIPS.
[44] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.