Exploration Bonuses and Dual Control
暂无分享,去创建一个
[1] H. Simon,et al. A Behavioral Model of Rational Choice , 1955 .
[2] H. Simon,et al. Rational choice and the structure of the environment. , 1956, Psychological review.
[3] S. Dreyfus. Dynamic Programming and the Calculus of Variations , 1960 .
[4] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[5] L. Meier. Combined optimal control and estimation. , 1965 .
[6] John M Gozzolino,et al. MARKOVIAN DECISION PROCESSES WITH UNCERTAIN TRANSITION PROBABILITIES , 1965 .
[7] C. Striebel. Sufficient statistics in the optimum control of stochastic systems , 1965 .
[8] A. G. Butkovskiy,et al. Optimal control of systems , 1966 .
[9] R. Rishel. Necessary and Sufficient Dynamic Programming Conditions for Continuous Time Stochastic Optimal Control , 1970 .
[10] Yaakov Bar-Shalom,et al. An actively adaptive control for linear systems with random parameters via the dual control approach , 1972, CDC 1972.
[11] W. J. Studden,et al. Theory Of Optimal Experiments , 1972 .
[12] Y. Bar-Shalom,et al. Wide-sense adaptive dual control for nonlinear stochastic systems , 1973 .
[13] M. Athans,et al. Some properties of the dual adaptive stochastic control algorithm , 1981 .
[14] Mitsuo Sato,et al. Learning control of finite Markov chains with unknown transition probabilities , 1982 .
[15] Patchigolla Kiran Kumar,et al. A Survey of Some Results in Stochastic Adaptive Control , 1985 .
[16] A. Barto,et al. Learning and Sequential Decision Making , 1989 .
[17] Alan D. Christiansen,et al. Learning reliable manipulation strategies without initial physical models , 1990, Proceedings., IEEE International Conference on Robotics and Automation.
[18] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[19] M. Gabriel,et al. Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .
[20] J. Urgen Schmidhuber,et al. Adaptive confidence and adaptive curiosity , 1991, Forschungsberichte, TU Munich.
[21] Sebastian Thrun,et al. Active Exploration in Dynamic Environments , 1991, NIPS.
[22] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .
[23] Sebastian Thrun,et al. The role of exploration in learning control , 1992 .
[24] C. Atkeson,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[25] David A. Cohn,et al. Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.
[26] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[27] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[28] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .
[29] Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.