论文信息 - Model-based Adversarial Imitation Learning - 字舞流文

Model-based Adversarial Imitation Learning

Generative adversarial learning is a popular new approach to training generative models which has been proven successful for other related problems as well. The general idea is to maintain an oracle $D$ that discriminates between the expert's data distribution and that of the generative model $G$. The generative model is trained to capture the expert's distribution by maximizing the probability of $D$ misclassifying the data it generates. Overall, the system is \emph{differentiable} end-to-end and is trained using basic backpropagation. This type of learning was successfully applied to the problem of policy imitation in a model-free setup. However, a model-free approach does not allow the system to be differentiable, which requires the use of high-variance gradient estimations. In this paper we introduce the Model based Adversarial Imitation Learning (MAIL) algorithm. A model-based approach for the problem of adversarial imitation learning. We show how to use a forward model to make the system fully differentiable, which enables us to train policies using the (stochastic) gradient of $D$. Moreover, our approach requires relatively few environment interactions, and fewer hyper-parameters to tune. We test our method on the MuJoCo physics simulator and report initial results that surpass the current state-of-the-art.

Shie Mannor | Nir Baram | Oron Anschel | Shie Mannor | Oron Anschel | Nir Baram

[1] Dean Pomerleau,et al. Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[2] Sean Gerrish,et al. Black Box Variational Inference , 2013, AISTATS.

[3] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[4] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[5] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .

[6] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[7] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[10] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[11] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[12] Marco Colombetti,et al. Robot Shaping: An Experiment in Behavior Engineering , 1997 .

[13] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[14] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.

[15] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[16] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Restarts , 2016, ArXiv.

[17] Amnon Shashua,et al. Long-term Planning by Short-term Prediction , 2016, ArXiv.

[18] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.

[19] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[20] Yann LeCun,et al. Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[21] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[22] Karol Gregor,et al. Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[23] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.