Feature-based methods for large scale dynamic programming
暂无分享,去创建一个
[1] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[2] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[3] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[4] Richard E. Korf,et al. Planning as Search: A Quantitative Approach , 1987, Artif. Intell..
[5] F. Girosi,et al. Networks for approximation and learning , 1990, Proc. IEEE.
[6] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[7] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[8] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[9] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .
[10] Thomas L. Morin,et al. COMPUTATIONAL ADVANCES IN DYNAMIC PROGRAMMING , 1978 .
[11] Philip E. Agre,et al. Computational Research on Interaction and Agency , 1995, Artif. Intell..
[12] Bhavik R. Bakshi,et al. Wave‐net: a multiresolution, hierarchical neural network with localized learning , 1993 .
[13] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..
[14] D. Bertsekas,et al. Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .
[15] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[16] R. Bellman,et al. FUNCTIONAL APPROXIMATIONS AND DYNAMIC PROGRAMMING , 1959 .
[17] Dimitri P. Bertsekas,et al. A Counterexample to Temporal Differences Learning , 1995, Neural Computation.
[18] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[19] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[20] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[21] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[22] Richard P. Lippmann,et al. LNKnet: Neural Network, Machine-Learning, and Statistical Software for Pattern Classification , 1993 .
[23] Xu Xin. Reinforcement learning algorithm for partially observable Markov decision processes , 2004 .
[24] Ward Whitt,et al. Approximations of Dynamic Programs, I , 1978, Math. Oper. Res..