Learning Relative Return Policies With Upside-Down Reinforcement Learning
暂无分享,去创建一个
[1] Tom Schaul,et al. Episodic Reinforcement Learning by Logistic Reward-Weighted Regression , 2008, ICANN.
[2] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.
[3] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[4] Sergey Levine,et al. Reward-Conditioned Policies , 2019, ArXiv.
[5] S. Gu,et al. Generalized Decision Transformer for Offline Hindsight Information Matching , 2021, ICLR.
[6] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .
[7] Geoffrey E. Hinton,et al. Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.
[8] Juergen Schmidhuber,et al. Reinforcement Learning Upside Down: Don't Predict Rewards - Just Map Them to Actions , 2019, ArXiv.
[9] Masashi Sugiyama,et al. Efficient Sample Reuse in EM-Based Policy Search , 2009, ECML/PKDD.
[10] Tom Schaul,et al. Fitness Expectation Maximization , 2008, PPSN.
[11] Filipe Wall Mutz,et al. Training Agents using Upside-Down Reinforcement Learning , 2019, ArXiv.
[12] Sergey Levine,et al. Learning to Reach Goals via Iterated Supervised Learning , 2019, ICLR.
[13] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[15] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.
[16] Pieter Abbeel,et al. Decision Transformer: Reinforcement Learning via Sequence Modeling , 2021, NeurIPS.
[17] Masashi Sugiyama,et al. Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning , 2011, Neural Computation.
[18] Sergey Levine,et al. Offline Reinforcement Learning as One Big Sequence Modeling Problem , 2021, NeurIPS.