Basis Function Adaptation in Temporal Difference Reinforcement Learning
暂无分享,去创建一个
[1] Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..
[2] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[3] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.
[4] Steven J. Bradtke,et al. Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.
[5] S. Hyakin,et al. Neural Networks: A Comprehensive Foundation , 1994 .
[6] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[7] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[8] Peter Auer,et al. Exponentially many local minima for single neurons , 1995, NIPS.
[9] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[10] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[11] G. McLachlan,et al. The EM algorithm and extensions , 1996 .
[12] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[13] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[14] R. Rubinstein. The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .
[15] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[16] Bjarne E. Helvik,et al. Using the Cross-Entropy Method to Guide/Govern Mobile Agent's Path Finding in Networks , 2001, MATA.
[17] Joydeep Ghosh,et al. An overview of radial basis function networks , 2001 .
[18] Michail G. Lagoudakis,et al. Model-Free Least-Squares Policy Iteration , 2001, NIPS.
[19] Doina Precup,et al. Characterizing Markov Decision Processes , 2002, ECML.
[20] Shie Mannor,et al. Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.
[21] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[22] Pawel Strumillo,et al. Radial Basis Function Neural Networks: Theory and Applications , 2003 .
[23] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[24] Shie Mannor,et al. The Cross Entropy Method for Fast Policy Search , 2003, ICML.
[25] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[26] Dirk P. Kroese,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .
[27] Dirk P. Kroese,et al. Cross‐Entropy Method , 2011 .
[28] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[29] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[30] Dirk P. Kroese,et al. Application of the Cross-Entropy Method to the Buffer Allocation Problem in a Simulation-Based Environment , 2005, Ann. Oper. Res..
[31] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[32] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[33] Shie Mannor,et al. A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..