论文信息 - Multi-Channel Interactive Reinforcement Learning for Sequential Tasks

Multi-Channel Interactive Reinforcement Learning for Sequential Tasks

The ability to learn new tasks by sequencing already known skills is an important requirement for future robots. Reinforcement learning is a powerful tool for this as it allows for a robot to learn and improve on how to combine skills for sequential tasks. However, in real robotic applications, the cost of sample collection and exploration prevent the application of reinforcement learning for a variety of tasks. To overcome these limitations, human input during reinforcement can be beneficial to speed up learning, guide the exploration and prevent the choice of disastrous actions. Nevertheless, there is a lack of experimental evaluations of multi-channel interactive reinforcement learning systems solving robotic tasks with input from inexperienced human users, in particular for cases where human input might be partially wrong. Therefore, in this paper, we present an approach that incorporates multiple human input channels for interactive reinforcement learning in a unified framework and evaluate it on two robotic tasks with 20 inexperienced human subjects. To enable the robot to also handle potentially incorrect human input we incorporate a novel concept for self-confidence, which allows the robot to question human input after an initial learning phase. The second robotic task is specifically designed to investigate if this self-confidence can enable the robot to achieve learning progress even if the human input is partially incorrect. Further, we evaluate how humans react to suggestions of the robot, once the robot notices human input might be wrong. Our experimental evaluations show that our approach can successfully incorporate human input to accelerate the learning process in both robotic tasks even if it is partially wrong. However, not all humans were willing to accept the robot's suggestions or its questioning of their input, particularly if they do not understand the learning process and the reasons behind the robot's suggestions. We believe that the findings from this experimental evaluation can be beneficial for the future design of algorithms and interfaces of interactive reinforcement learning systems used by inexperienced users.

[1] Lisa A. Torrey. Help an Agent Out : Student / Teacher Learning in Sequential Decision Tasks , 2011 .

[2] Bruce Blumberg,et al. Integrated learning for interactive synthetic characters , 2002, SIGGRAPH.

[3] Brett Browning,et al. Learning robot motion control with demonstration and advice-operators , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4] Stefan Schaal,et al. Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[5] Cynthia Breazeal,et al. Affective Personalization of a Social Robot Tutor for Children's Second Language Skills , 2016, AAAI.

[6] Thomas G. Dietterich,et al. Reinforcement Learning Via Practice and Critique Advice , 2010, AAAI.

[7] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[8] Carlos V. Regueiro,et al. Using Prior Knowledge to Improve Reinforcement Learning in Mobile Robotics , 2004 .

[9] Peter Stone,et al. Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.

[10] Stefan Wermter,et al. Improving interactive reinforcement learning: What makes a good teacher? , 2018, Connect. Sci..

[11] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[12] Fillia Makedon,et al. Task Engagement as Personalization Feedback for Socially-Assistive Robots and Cognitive Training , 2018 .

[13] Andrea Lockerd Thomaz,et al. Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[14] Jude W. Shavlik,et al. Creating Advice-Taking Reinforcement Learners , 1998, Machine Learning.

[15] Gregory Kuhlmann and Peter Stone and Raymond J. Mooney and Shavlik. Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer , 2004, AAAI 2004.

[16] Peter Stone,et al. Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[17] Cynthia Breazeal,et al. Training a Robot via Human Feedback: A Case Study , 2013, ICSR.

[18] Andrea Lockerd Thomaz,et al. Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[19] Jens Kober,et al. Reinforcement learning of motor skills using Policy Search and human corrective advice , 2019, Int. J. Robotics Res..

[20] Andrea Lockerd Thomaz,et al. Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..

[21] John Salvatier,et al. Agent-Agnostic Human-in-the-Loop Reinforcement Learning , 2017, ArXiv.

[22] Pierre-Yves Oudeyer,et al. Robot learning simultaneously a task and how to interpret human instructions , 2013, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[23] Tony Belpaeme,et al. Teaching robots social autonomy from in situ human guidance , 2019, Science Robotics.

[24] Javier Ruiz-del-Solar,et al. An Interactive Framework for Learning Continuous Actions Policies Based on Corrective Feedback , 2018, Journal of Intelligent & Robotic Systems.

[25] Peter Stone,et al. Cobot in LambdaMOO: An Adaptive Social Statistics Agent , 2006, Autonomous Agents and Multi-Agent Systems.

[26] Garrison W. Cottrell,et al. Principled Methods for Advising Reinforcement Learning Agents , 2003, ICML.

[27] W. B. Knox. Augmenting Reinforcement Learning with Human Feedback , 2011 .

[28] David L. Roberts,et al. A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans , 2016, AAMAS.

[29] David L. Roberts,et al. Learning something from nothing: Leveraging implicit human feedback strategies , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[30] Stefan Wermter,et al. Multi-modal Feedback for Affordance-driven Interactive Reinforcement Learning , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[31] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[32] Jan Peters,et al. Online Learning of an Open-Ended Skill Library for Collaborative Tasks , 2018, 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids).

[33] P. Stone,et al. TAMER: Training an Agent Manually via Evaluative Reinforcement , 2008, 2008 7th IEEE International Conference on Development and Learning.

[34] Carlos Celemin,et al. Teaching agents with corrective human feedback for challenging problems , 2016, 2016 IEEE Latin American Conference on Computational Intelligence (LA-CCI).

[35] Stefan Wermter,et al. Training Agents With Interactive Reinforcement Learning and Contextual Affordances , 2016, IEEE Transactions on Cognitive and Developmental Systems.

[36] Diane J. Cook,et al. User-guided reinforcement learning of robot assistive tasks for an intelligent environment , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[37] Cynthia Breazeal,et al. Real-Time Interactive Reinforcement Learning for Robots , 2005 .

[38] Matthew E. Taylor,et al. Teaching on a budget: agents advising agents in reinforcement learning , 2013, AAMAS.

[39] Ana Paiva,et al. Modelling Empathy in Social Robotic Companions , 2011, UMAP Workshops.

[40] Pierre-Yves Oudeyer,et al. Robotic clicker training , 2002, Robotics Auton. Syst..

[41] Sonia Chernova,et al. Effect of human guidance and state space size on Interactive Reinforcement Learning , 2011, 2011 RO-MAN.

[42] Tobias Baur,et al. Adapting a Robot's linguistic style based on socially-aware reinforcement learning , 2017, 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[43] Shimon Whiteson,et al. Using informative behavior to increase engagement while learning from human reward , 2015, Autonomous Agents and Multi-Agent Systems.

[44] Michael L. Littman,et al. Teaching a Robot Tasks of Arbitrary Complexity via Human Feedback , 2020, 2020 15th ACM/IEEE International Conference on Human-Robot Interaction (HRI).