Online learning control by association and reinforcement.
暂无分享,去创建一个
[1] Stuart E. Dreyfus,et al. Applied Dynamic Programming , 1965 .
[2] R. Larson,et al. A survey of dynamic programming computational procedures , 1967, IEEE Transactions on Automatic Control.
[3] Donald E. Kirk,et al. Optimal control theory : an introduction , 1970 .
[4] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[5] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[6] Gerald Tesauro,et al. Neurogammon: a neural-network backgammon program , 1990, 1990 IJCNN International Joint Conference on Neural Networks.
[7] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..
[8] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[9] Richard S. Sutton,et al. A Menu of Designs for Reinforcement Learning Over Time , 1995 .
[10] Roberto A. Santiago,et al. Adaptive critic designs: A case study for neurocontrol , 1995, Neural Networks.
[11] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[12] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[13] S. N. Balakrishnan,et al. A neighboring optimal adaptive critic for missile guidance , 1996 .
[14] Harold J. Kushner,et al. Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.
[15] Tilman Börgers,et al. Learning Through Reinforcement and Replicator Dynamics , 1997 .
[16] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..
[17] K. G. Eltohamy,et al. Nonlinear optimal control of a triple link inverted pendulum with single control input , 1998 .