Reinforcement Learning of Trajectory Distributions: Applications in Assisted Teleoperation and Motion Planning

The majority of learning from demonstration approaches do not address suboptimal demonstrations or cases when drastic changes in the environment occur after the demonstrations were made. For example, in real teleoperation tasks, the demonstrations provided by the user are often suboptimal due to interface and hardware limitations. In tasks involving co-manipulation and manipulation planning, the environment often changes due to unexpected obstacles rendering previous demonstrations invalid. This paper presents a reinforcement learning algorithm that exploits the use of relevance functions to tackle such problems. This paper introduces the Pearson correlation as a measure of the relevance of policy parameters in regards to each of the components of the cost function to be optimized. The method is demonstrated in a static environment where the quality of the teleoperation is compromised by the visual interface (operating a robot in a three-dimensional task by using a simple 2D monitor). Afterward, we tested the method on a dynamic environment using a real 7-DoF robot arm where distributions are computed online via Gaussian Process regression.

[1]  Sylvain Calinon,et al.  Supervisory teleoperation with online learning and optimal control , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Gerhard Neumann,et al.  A learning-based shared control architecture for interactive task execution , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[4]  Sylvain Calinon,et al.  Learning assistive teleoperation behaviors from demonstration , 2016, 2016 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR).

[5]  Stefan Schaal,et al.  Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.

[6]  Stefan Schaal,et al.  STOMP: Stochastic trajectory optimization for motion planning , 2011, 2011 IEEE International Conference on Robotics and Automation.

[7]  Jan Peters,et al.  Assisting Movement Training and Execution With Visual and Haptic Feedback , 2018, Front. Neurorobot..

[8]  Aude Billard,et al.  On Learning, Representing, and Generalizing a Task in a Humanoid Robot , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  Sylvain Calinon,et al.  A tutorial on task-parameterized movement learning and retrieval , 2016, Intell. Serv. Robotics.

[10]  Freek Stulp,et al.  Co-manipulation with multiple probabilistic virtual guides , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  Jan Peters,et al.  Using probabilistic movement primitives in robotics , 2018, Auton. Robots.

[12]  Gerhard Nahler,et al.  Pearson Correlation Coefficient , 2020, Definitions.