Combination of learning from non-optimal demonstrations and feedbacks using inverse reinforcement learning and Bayesian policy improvement

Abstract Inverse reinforcement learning (IRL) is a powerful tool for teaching by demonstrations, provided that sufficiently diverse and optimal demonstrations are given, and learner agent correctly perceives those demonstrations. These conditions are hard to meet in practice; as a trainer cannot cover all possibilities by demonstrations, he may partially fail to follow the optimal behavior. Also, trainer and learner have different perceptions of the environment including trainer's actions. A practical way to overcome these problems is using a combination of trainer's demonstrations and feedbacks. We propose an interactive learning approach to overcome the challenge of non-optimal demonstrations by integrating human evaluative feedbacks with the IRL process, given sufficiently diverse demonstrations and the domain transition model. To this end, we develop a probabilistic model of human feedbacks and iteratively improve the agent policy using Bayes rule. We then integrate this information in an extended IRL algorithm to enhance the learned reward function. We examine the developed approach in one experimental and two simulated tasks; i.e., a grid world navigation, a highway car driving system and a navigation task by the e-puck robot. Obtained results show significant improved efficiency of the proposed approach in face of having different levels of non-optimality in demonstrations and the number of evaluative feedbacks.

[1]  Brett Browning,et al.  Learning by demonstration with critique from a human teacher , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[2]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[3]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[4]  Michael L. Littman,et al.  Apprenticeship Learning About Multiple Intentions , 2011, ICML.

[5]  David Silver,et al.  Learning from Demonstration for Autonomous Navigation in Complex Unstructured Terrain , 2010, Int. J. Robotics Res..

[6]  Monica N. Nicolescu,et al.  Natural methods for robot task learning: instructive demonstrations, generalization and practice , 2003, AAMAS '03.

[7]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[8]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[9]  Francesco Mondada,et al.  The e-puck, a Robot Designed for Education in Engineering , 2009 .

[10]  Peter Stone,et al.  Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.

[11]  Sonia Chernova,et al.  Using Human Demonstrations to Improve Reinforcement Learning , 2011, AAAI Spring Symposium: Help Me Help You: Bridging the Gaps in Human-Agent Collaboration.

[12]  Thomas G. Dietterich,et al.  Reinforcement Learning Via Practice and Critique Advice , 2010, AAAI.

[13]  Michael H. Bowling,et al.  Apprenticeship learning using linear programming , 2008, ICML '08.

[14]  Majid Nili Ahmadabadi,et al.  Conceptual Imitation Learning in a Human-Robot Interaction Paradigm , 2012, TIST.

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Marc Toussaint,et al.  Direct Loss Minimization Inverse Optimal Control , 2015, Robotics: Science and Systems.

[17]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[18]  Abdelkader El Kamel,et al.  Neural inverse reinforcement learning in autonomous navigation , 2016, Robotics Auton. Syst..

[19]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[20]  Peter Stone,et al.  Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[21]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[22]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[23]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[24]  Pieter Abbeel,et al.  Learning for control from multiple demonstrations , 2008, ICML '08.

[25]  Siyuan Liu,et al.  Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise , 2014, AAAI.

[26]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[27]  Stefan Schaal,et al.  Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.