Improving Policies without Measuring Merits
暂无分享,去创建一个
[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[2] Mary W. Cooper,et al. Dynamic Programming and the Calculus of Variations , 1981 .
[3] David S. Broomhead,et al. Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..
[4] D. Broomhead,et al. Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .
[5] C. Watkins. Learning from delayed rewards , 1989 .
[6] A. Barto,et al. Learning and Sequential Decision Making , 1989 .
[7] Barbara Moore,et al. Theory of networks for learning , 1990, Defense, Security, and Sensing.
[8] T Poggio,et al. Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.
[9] Sebastian Thrun,et al. Explanation-Based Neural Network Learning for Robot Control , 1992, NIPS.
[10] Christopher G. Atkeson,et al. Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.
[11] Richard S. Sutton,et al. A Menu of Designs for Reinforcement Learning Over Time , 1995 .
[12] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..