A study in direct policy search
暂无分享,去创建一个
[1] Jürgen Schmidhuber,et al. Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..
[2] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[3] Risto Miikkulainen,et al. Incremental Evolution of Complex General Behavior , 1997, Adapt. Behav..
[4] Mauro Birattari,et al. Swarm Intelligence , 2012, Lecture Notes in Computer Science.
[5] Risto Miikkulainen,et al. Efficient Non-linear Control Through Neuroevolution , 2006, ECML.
[6] Pieter Bram Bakker,et al. The state of mind : reinforcement learning with recurrent neural networks , 2004 .
[7] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[8] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[9] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[10] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[11] Risto Miikkulainen,et al. Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.
[12] J. Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM networks , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..
[13] Nicol N. Schraudolph,et al. Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation , 2005, NIPS.
[14] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.
[15] P. Glynn. Optimization of stochastic systems via simulation , 1989, WSC '89.
[16] Risto Miikkulainen,et al. Solving Non-Markovian Control Tasks with Neuro-Evolution , 1999, IJCAI.
[17] Risto Miikkulainen,et al. Efficient Reinforcement Learning through Symbiotic Evolution , 1996, Machine Learning.
[18] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[19] Andrew McCallum,et al. Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State , 1995, ICML.
[20] Christian Igel,et al. Neuroevolution for reinforcement learning using evolution strategies , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..
[21] J. Baxter,et al. Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).
[22] Douglas Aberdeen,et al. Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .
[23] Tom Schaul,et al. Efficient natural evolution strategies , 2009, GECCO.
[24] Christian Igel,et al. Similarities and differences between policy gradient methods and evolution strategies , 2008, ESANN.
[25] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[26] Nikolaus Hansen,et al. Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.
[27] Jürgen Schmidhuber,et al. Recurrent policy gradients , 2010, Log. J. IGPL.
[28] Shun-ichi Amari,et al. Why natural gradient? , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[29] Risto Miikkulainen,et al. Efficient evolution of neural networks through complexification , 2004 .
[30] Tom Schaul,et al. Fitness Expectation Maximization , 2008, PPSN.
[31] D. Prokhorov. Toward effective combination of off-line and on-line training in ADP framework , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[32] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[33] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.
[34] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[35] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.
[36] Judy A. Franklin,et al. Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..
[37] Jürgen Schmidhuber,et al. Gödel Machines: Fully Self-referential Optimal Universal Self-improvers , 2007, Artificial General Intelligence.
[38] Dirk P. Kroese,et al. Cross‐Entropy Method , 2011 .
[39] Tom Schaul,et al. Stochastic search using the natural gradient , 2009, ICML '09.
[40] Eduardo Sontag,et al. Turing computability with neural nets , 1991 .
[41] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[42] Julian Togelius,et al. Point-to-Point Car Racing: an Initial Study of Evolution Versus Temporal Difference Learning , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.
[43] Marcus Hutter,et al. Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability (Texts in Theoretical Computer Science. An EATCS Series) , 2006 .
[44] J. Spall,et al. Theoretical framework for comparing several popular stochastic optimization approaches , 2002 .
[45] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[46] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[47] Julian Togelius,et al. The WCCI 2008 simulated car racing competition , 2008, 2008 IEEE Symposium On Computational Intelligence and Games.
[48] P J Webros. BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .
[49] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[50] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.
[51] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[52] Hans-Paul Schwefel,et al. Evolution strategies – A comprehensive introduction , 2002, Natural Computing.
[53] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[54] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.
[55] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).
[56] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[57] Jing J. Liang,et al. Problem Definitions and Evaluation Criteria for the CEC 2005 Special Session on Real-Parameter Optimization , 2005 .
[58] Shimon Whiteson,et al. Empirical Studies in Action Selection with Reinforcement Learning , 2007, Adapt. Behav..
[59] Hans-Georg Beyer,et al. Toward a Theory of Evolution Strategies: Self-Adaptation , 1995, Evolutionary Computation.
[60] Vijaykumar Gullapalli,et al. A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.
[61] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[62] Tom Schaul,et al. Episodic Reinforcement Learning by Logistic Reward-Weighted Regression , 2008, ICANN.
[63] James C. Spall,et al. Stochastic optimization and the simultaneous perturbation method , 1999, WSC '99.
[64] A. P. Wieland,et al. Evolving neural network controllers for unstable systems , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.
[65] Hans-Paul Schwefel,et al. TWO-PHASE NOZZLE AND HOLLOW CORE JET EXPERIMENTS. , 1970 .
[66] Ingo Rechenberg,et al. Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .
[67] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[68] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[69] S. Shankar Sastry,et al. Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.
[70] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[71] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[72] Matthew Saffell,et al. Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.
[73] Bram Bakker,et al. Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.
[74] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
[75] Julian Togelius,et al. Evolution of Neural Networks for Helicopter Control: Why Modularity Matters , 2006, 2006 IEEE International Conference on Evolutionary Computation.
[76] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[77] Jürgen Schmidhuber,et al. Solving Deep Memory POMDPs with Recurrent Policy Gradients , 2007, ICANN.
[78] Risto Miikkulainen,et al. Accelerated Neural Evolution through Cooperatively Coevolved Synapses , 2008, J. Mach. Learn. Res..
[79] Vijaykumar Gullapalli,et al. Reinforcement learning and its application to control , 1992 .
[80] Geoffrey E. Hinton,et al. Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.