### Tree-Based Batch Mode Reinforcement Learning

暂无分享，去创建一个

[1] R. Bellman,et al. Polynomial approximation—a new computational technique in dynamic programming: Allocation processes , 1962 .

[2] D. Luenberger. Optimization by Vector Space Methods , 1968 .

[3] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[4] C. Atkeson,et al. Prioritized Sweeping : Reinforcement Learning with Less Data and Less Real Time , 1993 .

[5] Mark W. Spong,et al. Swing up control of the Acrobot , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[6] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[7] John Rust. Using Randomization to Break the Curse of Dimensionality , 1997 .

[8] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[9] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.

[10] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[11] Geoffrey J. Gordon. Online Fitted Reinforcement Learning , 1995 .

[12] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[13] Leemon C. Baird. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[14] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[15] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .

[16] Manuela M. Veloso,et al. Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.

[17] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[18] O. Hernández-Lerma,et al. Discrete-time Markov control processes , 1999 .

[19] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .

[20] Thomas G. Dietterich,et al. Efficient Value Function Approximation Using Regression Trees , 1999 .

[21] Junichiro Yoshimoto,et al. Application of reinforcement learning to balancing of Acrobot , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[22] L. Breiman. SOME INFINITY THEORY FOR PREDICTOR ENSEMBLES , 2000 .

[23] Peter W. Glynn,et al. Kernel-based reinforcement learning in average-cost problems , 2002, IEEE Trans. Autom. Control..

[24] Leslie Pack Kaelbling,et al. Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[25] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[26] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[27] Rosaleen J. Anderson. Near optimal closed-loop control Application to electric power systems , 2003 .

[28] J. Langford,et al. Reducing T-step reinforcement learning to classifica-tion , 2003 .

[29] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[30] Pierre Geurts,et al. Iteratively Extending Time Horizon Reinforcement Learning , 2003, ECML.

[31] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.

[32] Michail G. Lagoudakis,et al. Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.

[33] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[34] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[35] Andrew W. Moore,et al. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[36] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[37] Leo Breiman,et al. Bagging predictors , 2004, Machine Learning.

[38] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.

[39] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[40] D. Ernst,et al. Approximate Value Iteration in the Reinforcement Learning Context. Application to Electrical Power System Control. , 2005 .

[41] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[42] Richard S. Sutton,et al. Learning to Predict by the Methods of Temporal Differences , 1988, Machine Learning.

[43] Pierre Geurts,et al. Extremely randomized trees , 2006, Machine Learning.

[44] Yi Lin,et al. Random Forests and Adaptive Nearest Neighbors , 2006 .

[45] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.