Reinforcement Learning Through Gradient Descent
暂无分享,去创建一个
[1] N. Rajan,et al. Pursuit-Evasion of Two Aircraft in a Horizontal Plane , 1980 .
[2] Lamberto Cesari,et al. Optimization-Theory And Applications , 1983 .
[3] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[4] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[5] Geoffrey E. Hinton,et al. Learning representations by back-propagation errors, nature , 1986 .
[6] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .
[7] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[8] H. White. Some Asymptotic Results for Learning in Single Hidden-Layer Feedforward Network Models , 1989 .
[9] R. Sutton,et al. Connectionist Learning for Control: An Overview , 1989 .
[10] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[11] A. Barto,et al. Learning and Sequential Decision Making , 1989 .
[12] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.
[13] L. Baird,et al. A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL DYNAMIC PROGRAMMING , 1990 .
[14] Gerald Tesauro,et al. Neurogammon: a neural-network backgammon program , 1990, 1990 IJCNN International Joint Conference on Neural Networks.
[15] Peter J. Millington,et al. Associative reinforcement learning for optimal control , 1991 .
[16] Steven J. Bradtke,et al. Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.
[17] Vijaykumar Gullapalli,et al. Reinforcement learning and its application to control , 1992 .
[18] Olvi L. Mangasarian,et al. Backpropagation Convergence via Deterministic Nonmonotone Perturbed Minimization , 1993, NIPS.
[19] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[20] Leemon C Baird,et al. Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .
[21] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[22] M. V. Solodovy,et al. STABILITY PROPERTIES OF THE GRADIENT PROJECTION METHOD WITH APPLICATIONS TO THE BACKPROPAGATION ALGORITHM , 1994 .
[23] Alexei A. Gaivoronski,et al. Convergence properties of backpropagation for neural nets via theory of stochastic gradient methods. Part 1 , 1994 .
[24] Mikhail Solodov,et al. STABILITY PROPERTIES OF THE GRADIENT PROJECTION METHOD WITH APPLICATIONS TO THE BACKPROPAGATION ALGORITHM , 1994 .
[25] A. Harry Klopf,et al. Advantage Updating Applied to a Differrential Game , 1994, NIPS.
[26] Andrew G. Barto,et al. Reinforcement Learning and Dynamic Programming , 1995 .
[27] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[28] V. Tresp,et al. Missing and noisy data in nonlinear time-series prediction , 1995, Proceedings of 1995 IEEE Workshop on Neural Networks for Signal Processing.
[29] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[30] Geoffrey J. Gordon. Stable Fitted Reinforcement Learning , 1995, NIPS.
[31] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[32] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[33] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[34] Mikhail V. Solodov,et al. Nonmonotone and perturbed optimization , 1996 .
[35] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[36] M. V. SolodovyJune. Convergence Analysis of Perturbed Feasible Descent Methods , 1997 .
[37] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[38] M. Solodov,et al. Error Stability Properties of Generalized Gradient-Type Algorithms , 1998 .
[39] Peter Marbach,et al. Simulation-based optimization of Markov decision processes , 1998 .
[40] Leslie Pack Kaelbling,et al. Learning Policies with External Memory , 1999, ICML.
[41] Stanton Earl Weaver,et al. A Theoretical Framework for Local Adaptive Networks in Static and Dynamic Systems , 1999 .
[42] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .
[43] John N. Tsitsiklis,et al. Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..