暂无分享,去创建一个
[1] Ronald J. Williams,et al. Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .
[2] Hamid Reza Maei,et al. Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation , 2018, ArXiv.
[3] Jürgen Schmidhuber,et al. Evolino: Hybrid Neuroevolution / Optimal Linear Search for Sequence Prediction , 2005, IJCAI 2005.
[4] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.
[5] Benjamin Recht,et al. Simple random search of static linear policies is competitive for reinforcement learning , 2018, NeurIPS.
[6] Yishay Mansour,et al. Learning Bounds for Importance Weighting , 2010, NIPS.
[7] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[8] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[9] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[10] J. Schmidhuber. Making the World Differentiable: On Using Self-Supervised Fully Recurrent Neural Networks for Dynamic Reinforcement Learning and Planning in Non-Stationary Environm~nts , 2018 .
[11] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[12] Sepp Hochreiter,et al. Untersuchungen zu dynamischen neuronalen Netzen , 1991 .
[13] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[14] Jürgen Schmidhuber,et al. Networks adjusting networks , 1990 .
[15] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.
[16] Prabhat,et al. Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.
[17] Marcello Restelli,et al. Policy Optimization via Importance Sampling , 2018, NeurIPS.
[18] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.
[19] Jürgen Schmidhuber,et al. Training Recurrent Networks by Evolino , 2007, Neural Computation.
[20] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[21] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[22] Jürgen Schmidhuber,et al. A ‘Self-Referential’ Weight Matrix , 1993 .
[23] Henry Markram,et al. Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.
[24] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).
[25] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[26] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[27] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[28] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[29] Frank Sehnke,et al. Policy Gradients with Parameter-Based Exploration for Control , 2008, ICANN.
[30] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[31] R. L. Stratonovich. CONDITIONAL MARKOV PROCESSES , 1960 .
[32] Jürgen Schmidhuber,et al. A robot that reinforcement-learns to identify and memorize important previous observations , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).
[33] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[34] Tom Schaul,et al. Policy Evaluation Networks , 2020, ArXiv.
[35] Jürgen Schmidhuber,et al. Learning to Generate Artificial Fovea Trajectories for Target Detection , 1991, Int. J. Neural Syst..
[36] Jürgen Schmidhuber,et al. Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.
[37] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[38] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[39] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[40] C. Malsburg. Self-organization of orientation sensitive cells in the striate cortex , 2004, Kybernetik.
[41] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[42] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[43] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[44] A. Rollett,et al. The Monte Carlo Method , 2004 .
[45] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[46] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[47] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[48] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[49] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[50] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[51] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[52] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.
[53] Daniel Keysers,et al. Predicting Neural Network Accuracy from Weights , 2020, ArXiv.
[54] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[55] Jürgen Schmidhuber,et al. Modeling systems with internal state using evolino , 2005, GECCO '05.
[56] Jürgen Schmidhuber,et al. Recurrent policy gradients , 2010, Log. J. IGPL.
[57] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[58] G. Box,et al. On the Experimental Attainment of Optimum Conditions , 1951 .
[59] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[60] John E. Dennis,et al. Optimization Using Surrogate Objectives on a Helicopter Test Example , 1998 .
[61] Herbert Jaeger,et al. The''echo state''approach to analysing and training recurrent neural networks , 2001 .
[62] Donald R. Jones,et al. A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..
[63] Alex Graves,et al. Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.
[64] Frank Sehnke,et al. Parameter-exploring policy gradients , 2010, Neural Networks.
[65] Andrew W. Moore,et al. Memory-based Stochastic Optimization , 1995, NIPS.