Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
暂无分享,去创建一个
[1] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.
[2] O. Hernondex-lerma,et al. Adaptive Markov Control Processes , 1989 .
[3] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[4] L. C. Baird,et al. Reinforcement learning in continuous time: advantage updating , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).
[5] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.
[6] Marco Colombetti,et al. Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..
[7] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[8] David S. Touretzky,et al. Shaping robot behavior using principles from instrumental conditioning , 1997, Robotics Auton. Syst..
[9] Stuart J. Russell. Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.
[10] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[11] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.