Episodic Reinforcement Learning by Logistic Reward-Weighted Regression
暂无分享,去创建一个
Tom Schaul | Jürgen Schmidhuber | Jan Peters | Daan Wierstra | J. Schmidhuber | T. Schaul | Jan Peters | Daan Wierstra
[1] H. Chernoff,et al. Elementary Decision Theory. , 1988 .
[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[3] Robert B. Fisher,et al. Incremental One-Class Learning with Bounded Computational Complexity , 2007, ICANN.
[4] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[5] P. Glynn. Optimization of stochastic systems via simulation , 1989, WSC '89.
[6] Jürgen Schmidhuber,et al. Solving Deep Memory POMDPs with Recurrent Policy Gradients , 2007, ICANN.
[7] Michael H. Bowling,et al. Learning predictive state representations using non-blind policies , 2006, ICML '06.
[8] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.
[9] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[10] P J Webros. BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .
[11] Geoffrey E. Hinton,et al. Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.
[12] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.
[13] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[14] Bram Bakker,et al. Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.
[15] Michael L. Littman,et al. Planning with predictive state representations , 2004, 2004 International Conference on Machine Learning and Applications, 2004. Proceedings..
[16] H. Chernoff,et al. Elementary Decision Theory , 1959 .