Regularized Least Squares Temporal Difference Learning with Nested ℓ2 and ℓ1 Penalization
暂无分享,去创建一个
Matthew W. Hoffman | Alessandro Lazaric | Rémi Munos | Mohammad Ghavamzadeh | R. Munos | A. Lazaric | M. Ghavamzadeh
[1] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .
[2] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .
[3] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[4] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[5] Ronald Parr,et al. Linear Complementarity for Regularized Policy Evaluation and Improvement , 2010, NIPS.
[6] Matthew W. Hoffman,et al. Finite-Sample Analysis of Lasso-TD , 2011, ICML.
[7] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[9] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.
[10] R. Tibshirani,et al. Least angle regression , 2004, math/0406456.
[11] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[12] A. Tsybakov,et al. Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.
[13] R. Tibshirani,et al. PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.
[14] Matthieu Geist,et al. ℓ1-Penalized Projected Bellman Residual , 2011, EWRL.
[15] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[16] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.