暂无分享,去创建一个
[1] G. Box,et al. On the Experimental Attainment of Optimum Conditions , 1951 .
[2] R. L. Stratonovich. CONDITIONAL MARKOV PROCESSES , 1960 .
[3] Reuven Y. Rubinstein,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.
[4] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[5] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.
[6] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.
[7] Jürgen Schmidhuber,et al. Networks adjusting networks , 1990 .
[8] Sepp Hochreiter,et al. Untersuchungen zu dynamischen neuronalen Netzen , 1991 .
[9] Jürgen Schmidhuber,et al. Learning to Generate Artificial Fovea Trajectories for Target Detection , 1991, Int. J. Neural Syst..
[10] Jürgen Schmidhuber,et al. Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.
[11] Jürgen Schmidhuber,et al. A ‘Self-Referential’ Weight Matrix , 1993 .
[12] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[13] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[14] Andrew W. Moore,et al. Memory-based Stochastic Optimization , 1995, NIPS.
[15] Ronald J. Williams,et al. Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .
[16] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[17] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[18] John E. Dennis,et al. Optimization Using Surrogate Objectives on a Helicopter Test Example , 1998 .
[19] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[20] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[21] Herbert Jaeger,et al. The''echo state''approach to analysing and training recurrent neural networks , 2001 .
[22] Donald R. Jones,et al. A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..
[23] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[24] Henry Markram,et al. Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.
[25] Jürgen Schmidhuber,et al. A robot that reinforcement-learns to identify and memorize important previous observations , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).
[26] C. Malsburg. Self-organization of orientation sensitive cells in the striate cortex , 2004, Kybernetik.
[27] A. Rollett,et al. The Monte Carlo Method , 2004 .
[28] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[29] Jürgen Schmidhuber,et al. Modeling systems with internal state using evolino , 2005, GECCO '05.
[30] Jürgen Schmidhuber,et al. Evolino: Hybrid Neuroevolution / Optimal Linear Search for Sequence Prediction , 2005, IJCAI 2005.
[31] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[32] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[33] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.
[34] Jürgen Schmidhuber,et al. Training Recurrent Networks by Evolino , 2007, Neural Computation.
[35] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[36] Frank Sehnke,et al. Policy Gradients with Parameter-Based Exploration for Control , 2008, ICANN.
[37] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[38] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).
[39] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[40] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[41] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[42] Yishay Mansour,et al. Learning Bounds for Importance Weighting , 2010, NIPS.
[43] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[44] Jürgen Schmidhuber,et al. Recurrent policy gradients , 2010, Log. J. IGPL.
[45] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[46] Frank Sehnke,et al. Parameter-exploring policy gradients , 2010, Neural Networks.
[47] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[48] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[49] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[50] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.
[51] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[52] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[53] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[54] Prabhat,et al. Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.
[55] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[56] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[57] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[58] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[59] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[60] Alex Graves,et al. Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.
[61] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[62] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[63] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[64] Benjamin Recht,et al. Simple random search of static linear policies is competitive for reinforcement learning , 2018, NeurIPS.
[65] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[66] Hamid Reza Maei,et al. Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation , 2018, ArXiv.
[67] Marcello Restelli,et al. Policy Optimization via Importance Sampling , 2018, NeurIPS.
[68] Martha White,et al. An Off-policy Policy Gradient Theorem Using Emphatic Weightings , 2018, NeurIPS.
[69] J. Schmidhuber. Making the World Differentiable: On Using Self-Supervised Fully Recurrent Neural Networks for Dynamic Reinforcement Learning and Planning in Non-Stationary Environm~nts , 2018 .
[70] Ilya Kostrikov,et al. AlgaeDICE: Policy Gradient from Arbitrary Experience , 2019, ArXiv.
[71] Emma Brunskill,et al. Off-Policy Policy Gradient with State Distribution Correction , 2019, UAI 2019.
[72] Daniel Keysers,et al. Predicting Neural Network Accuracy from Weights , 2020, ArXiv.
[73] Tom Schaul,et al. Policy Evaluation Networks , 2020, ArXiv.
[74] Luca Martino,et al. Advances in Importance Sampling , 2021, Wiley StatsRef: Statistics Reference Online.