Weighted Likelihood Policy Search with Model Selection
暂无分享,去创建一个
Yoshinobu Kawahara | Takashi Washio | Tsuyoshi Ueno | Kohei Hayashi | K. Hayashi | Tsuyoshi Ueno | Y. Kawahara | T. Washio
[1] H. Akaike. A new look at the statistical model identification , 1974 .
[2] G. Schwarz. Estimating the Dimension of a Model , 1978 .
[3] R. C. Bradley. Basic Properties of Strong Mixing Conditions , 1985 .
[4] P. Bickel. Efficient and Adaptive Estimation for Semiparametric Models , 1993 .
[5] K. Do,et al. Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .
[6] Geoffrey E. Hinton,et al. Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.
[7] S. Amari,et al. Information geometry of estimating functions in semi-parametric statistical models , 1997 .
[8] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[9] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[10] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[11] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[12] Guillaume Bouchard,et al. The Tradeoff Between Generative and Discriminative Classifiers , 2004 .
[13] R. C. Bradley. Basic properties of strong mixing conditions. A survey and some open questions , 2005, math/0511078.
[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[15] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[16] Rémi Munos,et al. Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation , 2005, J. Mach. Learn. Res..
[17] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.
[18] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[19] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[20] Marc Toussaint,et al. Learning model-free robot control by a Monte Carlo EM algorithm , 2009, Auton. Robots.
[21] Csaba Szepesvári,et al. Model Selection in Reinforcement Learning , 2011, Machine Learning.
[22] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .
[23] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[24] Joelle Pineau,et al. PAC-Bayesian Model Selection for Reinforcement Learning , 2010, NIPS.
[25] Stefan Schaal,et al. A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..
[26] Gang Niu,et al. Analysis and Improvement of Policy Gradient Estimation , 2011, NIPS.
[27] Motoaki Kawanabe,et al. Generalized TD Learning , 2011, J. Mach. Learn. Res..
[28] Masashi Sugiyama,et al. Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning , 2011, Neural Computation.
[29] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.
[30] Hilbert J. Kappen,et al. Dynamic policy programming , 2010, J. Mach. Learn. Res..
[31] Vicenç Gómez,et al. Optimal control as a graphical model inference problem , 2009, Machine Learning.
[32] Alessandro Lazaric,et al. Finite-sample analysis of least-squares policy iteration , 2012, J. Mach. Learn. Res..