Reducing policy degradation in neuro-dynamic programming
暂无分享,去创建一个
[1] Geoffrey J. Gordon. Stable Fitted Reinforcement Learning , 1995, NIPS.
[3] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[4] Michael Pinedo,et al. Scheduling: Theory, Algorithms, and Systems , 1994 .
[5] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[6] Dimitri P. Bertsekas,et al. Missile defense and interceptor allocation by neuro-dynamic programming , 2000, IEEE Trans. Syst. Man Cybern. Part A.
[7] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[8] Chris Gaskett,et al. Q-Learning for Robot Control , 2002 .
[9] Martin A. Riedmiller,et al. A Neural Reinforcement Learning Approach to Learn Local Dispatching Policies in Production Scheduling , 1999, IJCAI.
[10] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[11] Dimitri P. Bertsekas,et al. Missile Defense and Interceptor Allocation by , 2000 .