论文信息 - An Algorithmic Perspective on Imitation Learning

An Algorithmic Perspective on Imitation Learning

As robots and other intelligent agents move from simple environments and problems to more complex, unstructured settings, manually programming their behavior has become increasingly challenging and expensive. Often, it is easier for a teacher to demonstrate a desired behavior rather than attempt to manually engineer it. This process of learning from demonstrations, and the study of algorithms to do so, is called imitation learning. This work provides an introduction to imitation learning. It covers the underlying assumptions, approaches, and how they relate; the rich set of algorithms developed to tackle the problem; and advice on effective tools and implementation. We intend this paper to serve two audiences. First, we want to familiarize machine learning experts with the challenges of imitation learning, particularly those arising in robotics, and the interesting theoretical and practical distinctions between it and more familiar frameworks like statistical supervised learning theory and reinforcement learning. Second, we want to give roboticists and experts in applied artificial intelligence a broader appreciation for the frameworks and tools available for imitation learning. We pay particular attention to the intimate connection between imitation learning approaches and those of structured prediction Daume III et al. [2009]. To structure this discussion, we categorize imitation learning techniques based on the following key criteria which drive algorithmic decisions: 1) The structure of the policy space. Is the learned policy a time-index trajectory (trajectory learning), a mapping from observations to actions (so called behavioral cloning [Bain and Sammut, 1996]), or the result of a complex optimization or planning problem at each execution as is common in inverse optimal control methods [Kalman, 1964, Moylan and Anderson, 1973]. 2) The information available during training and testing. In particular, is the learning algorithm privy to the full state that the teacher possess? Is the learner able to interact with the teacher and gather corrections or more data? Does the learner have a (typically a priori) model of the system with which it interacts? Does the learner have access to the reward (cost) function that the teacher is attempting to optimize? 3) The notion of success. Different algorithmic approaches provide varying guarantees on the resulting learned behavior. These guarantees range from weaker (e.g., measuring disagreement with the agent’s decision) to stronger (e.g., providing guarantees on the performance of the learner with respect to a true cost function, either known or unknown). We organize our work by paying particular attention to distinction (1): dividing imitation learning into directly replicating desired behavior (sometimes called behavioral cloning) and learning the hidden objectives of the desired behavior from demonstrations (called inverse optimal control or inverse reinforcement learning [Russell, 1998]). In the latter case, behavior arises as the result of an optimization problem solved for each new instance that the learner faces. In addition to method analysis, we discuss the design decisions a practitioner must make when selecting an imitation learning approach. Moreover, application examples—such as robots that play table tennis [Kober and Peters, 2009], programs that play the game of Go [Silver et al., 2016], and systems that understand natural language [Wen et al., 2015]— illustrate the properties and motivations behind different forms of imitation learning. We conclude by presenting a set of open questions and point towards possible future research directions for machine learning.

[1] E. Jaynes. Information Theory and Statistical Mechanics , 1957 .

[2] R. E. Kalman,et al. When Is a Linear Control System Optimal , 1964 .

[3] Donald Michie,et al. Man-Machine Co-operation on a Learning Task , 1969 .

[4] B. Anderson,et al. Nonlinear regulator theory and an inverse optimal control problem , 1973 .

[5] S. Chiba,et al. Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[6] Lawrence R. Rabiner,et al. A tutorial on Hidden Markov Models , 1986 .

[7] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[8] Tomás Lozano-Pérez,et al. Task-level planning of pick-and-place robot motions , 1989, Computer.

[9] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[10] Katsushi Ikeuchi,et al. Toward automatic robot instruction from perception-recognizing a grasp from observation , 1993, IEEE Trans. Robotics Autom..

[11] Masayuki Inaba,et al. Learning by watching: extracting reusable task knowledge from visual observation of human performance , 1994, IEEE Trans. Robotics Autom..

[12] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[13] Claude Sammut,et al. A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.

[14] Rui Camacho,et al. Behavioral Cloning A Correction , 1995, AI Mag..

[15] S. Schaal,et al. A Kendama Learning Robot Based on Bi-directional Theory , 1996, Neural Networks.

[16] Jeff G. Schneider,et al. Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning , 1996, NIPS.

[17] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[18] Stefan Schaal,et al. Robot Learning From Demonstration , 1997, ICML.

[19] Huaiyu Zhu. On Information and Sufficiency , 1997 .

[20] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[21] Stuart J. Russell. Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.

[22] S. Pattinson,et al. Learning to fly. , 1998 .

[23] Christopher G. Atkeson,et al. Constructive Incremental Learning from Only Local Information , 1998, Neural Computation.

[24] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[25] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.

[26] Stefan Schaal,et al. Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[27] Stefan Schaal,et al. Locally Weighted Projection Regression : An O(n) Algorithm for Incremental Real Time Learning in High Dimensional Space , 2000 .

[28] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[29] Mitsuo Kawato,et al. MOSAIC Model for Sensorimotor Learning and Control , 2001, Neural Computation.

[30] Steven Lemm,et al. A Dynamic HMM for On-line Segmentation of Sequential Data , 2001, NIPS.

[31] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[32] Jun Nakanishi,et al. Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[33] Jun Nakanishi,et al. Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[34] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[35] Richard E. Parent,et al. Computer animation - algorithms and techniques , 2012 .

[36] John Langford,et al. Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[37] Jun Nakanishi,et al. Learning Movement Primitives , 2005, ISRR.

[38] Anand Rangarajan,et al. A new point matching algorithm for non-rigid registration , 2003, Comput. Vis. Image Underst..

[39] Andrew W. Moore,et al. Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[40] Yoshihiko Nakamura,et al. Embodied Symbol Emergence Based on Mimesis Theory , 2004, Int. J. Robotics Res..

[41] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[42] Ales Ude,et al. Programming full-body movements for humanoid robots by observation , 2004, Robotics Auton. Syst..

[43] Ben Taskar,et al. Learning structured prediction models: a large margin approach , 2005, ICML.

[44] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[45] Stefan Schaal,et al. Incremental Online Learning in High Dimensions , 2005, Neural Computation.

[46] Yann LeCun,et al. Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[47] E. Todorov,et al. A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[48] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[49] J. Andrew Bagnell,et al. Maximum margin planning , 2006, ICML.

[50] Rajesh P. N. Rao,et al. Dynamic Imitation in a Humanoid Robot through Nonparametric Probabilistic Inference , 2006, Robotics: Science and Systems.

[51] Hans Ulrich Simon,et al. Learning Theory, 19th Annual Conference on Learning Theory, COLT 2006, Pittsburgh, PA, USA, June 22-25, 2006, Proceedings , 2006, COLT.

[52] Rajesh P. N. Rao,et al. Learning Nonparametric Models for Probabilistic Imitation , 2006, NIPS.

[53] Miroslav Dudík,et al. Maximum Entropy Distribution Estimation with Generalized Regularization , 2006, COLT.

[54] David M. Bradley,et al. Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[55] Aude Billard,et al. Incremental learning of gestures by imitation in a humanoid robot , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[56] Thomas Hofmann,et al. Predicting Structured Data (Neural Information Processing) , 2007 .

[57] Atsushi Nakazawa,et al. Learning from Observation Paradigm: Leg Task Models for Enabling a Biped Humanoid Robot to Imitate Human Dances , 2007, Int. J. Robotics Res..

[58] P. Fearnhead,et al. On‐line inference for multiple changepoint problems , 2007 .

[59] Aude Billard,et al. On Learning, Representing, and Generalizing a Task in a Humanoid Robot , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[60] Pieter Abbeel,et al. Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[61] Eyal Amir,et al. Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[62] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[63] Pascal Poupart,et al. Model-based Bayesian Reinforcement Learning in Partially Observable Domains , 2008, ISAIM.

[64] Dana Kulic,et al. Incremental Learning, Clustering and Hierarchy Formation of Whole Body Motion Patterns using Adaptive Hidden Markov Chains , 2008, Int. J. Robotics Res..

[65] Danica Kragic,et al. Robot Learning from Demonstration: A Task-level Planning Approach , 2008 .

[66] Betty J. Mohler,et al. Learning perceptual coupling for motor primitives , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[67] Joelle Pineau,et al. Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs , 2008, ICML '08.

[68] Pieter Abbeel,et al. Learning for control from multiple demonstrations , 2008, ICML '08.

[69] Jan Peters,et al. Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[70] Stefan Schaal,et al. Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[71] Aude Billard,et al. Statistical Learning by Imitation of Competing Constraints in Joint Space and Task Space , 2009, Adv. Robotics.

[72] John Langford,et al. Search-based structured prediction , 2009, Machine Learning.

[73] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[74] Rachid Alami,et al. A Hybrid Approach to Intricate Motion, Manipulation and Task Planning , 2009, Int. J. Robotics Res..

[75] Kee-Eung Kim,et al. Inverse Reinforcement Learning in Partially Observable Environments , 2009, IJCAI.

[76] David Silver,et al. Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[77] Bernhard Sendhoff,et al. Creating Brain-Like Intelligence: From Basic Principles to Complex Intelligent Systems , 2009, Creating Brain-Like Intelligence.

[78] Chris L. Baker,et al. Action understanding as inverse planning , 2009, Cognition.

[79] Michael I. Jordan,et al. Sharing Features among Dynamical Systems with Beta Processes , 2009, NIPS.

[80] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[81] Stefan Schaal,et al. Biologically-inspired dynamical systems for movement generation: Automatic real-time goal adaptation and obstacle avoidance , 2009, 2009 IEEE International Conference on Robotics and Automation.

[82] Manuel Lopes,et al. Active Learning for Reward Estimation in Inverse Reinforcement Learning , 2009, ECML/PKDD.

[83] Anil K. Bera,et al. Maximum entropy autoregressive conditional heteroskedasticity model , 2009 .

[84] Csaba Szepesvári,et al. Training parsers by inverse reinforcement learning , 2009, Machine Learning.

[85] Manuela M. Veloso,et al. Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..

[86] J. Maryniak,et al. Configurations of the Graf-Boklev (V-Style) SKI Jumper Model and Aerodynamic Parameters in a Wind Tunnel , 2009 .

[87] Yoshihiko Nakamura,et al. Mimesis Model from Partial Observations for a Humanoid Robot , 2010, Int. J. Robotics Res..

[88] Pieter Abbeel,et al. Superhuman performance of surgical tasks by robots using iterative learning from human-guided demonstrations , 2010, 2010 IEEE International Conference on Robotics and Automation.

[89] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[90] Oskar von Stryk,et al. BioRob-Arm: A Quickly Deployable and Intrinsically Safe, Light- Weight Robot Arm for Service Robotics Applications , 2010, ISR/ROBOTIK.

[91] J. Andrew Bagnell,et al. Boosted Backpropagation Learning for Training Deep Modular Networks , 2010, ICML.

[92] Emanuel Todorov,et al. Inverse Optimal Control with Linearly-Solvable MDPs , 2010, ICML.

[93] Pieter Abbeel,et al. Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[94] Shunzheng Yu,et al. Hidden semi-Markov models , 2010, Artif. Intell..

[95] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.

[96] Darwin G. Caldwell,et al. Learning and Reproduction of Gestures by Imitation , 2010, IEEE Robotics & Automation Magazine.

[97] Marc Peter Deisenroth,et al. Efficient reinforcement learning using Gaussian processes , 2010 .

[98] James Andrew Bagnell,et al. Learning in modular systems , 2010 .

[99] Kristian Kersting,et al. Multi-Agent Inverse Reinforcement Learning , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[100] Motoaki Kawanabe,et al. Dimensionality reduction for density ratio estimation in high-dimensional spaces , 2010, Neural Networks.

[101] Yoshihiko Nakamura,et al. Mimetic Communication Model with Compliant Physical Contact in Human—Humanoid Interaction , 2010, Int. J. Robotics Res..

[102] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[103] David Silver,et al. Learning from Demonstration for Autonomous Navigation in Complex Unstructured Terrain , 2010, Int. J. Robotics Res..

[104] Aude Billard,et al. Learning Stable Nonlinear Dynamical Systems With Gaussian Mixture Models , 2011, IEEE Transactions on Robotics.

[105] Sergey Levine,et al. Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[106] Aude Billard,et al. Learning Non-linear Multivariate Dynamics of Motion in Robotic Manipulators , 2011, Int. J. Robotics Res..

[107] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[108] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[109] Kevin Waugh,et al. Computational Rationalization: The Inverse Equilibrium Problem , 2011, ICML.

[110] Kee-Eung Kim,et al. MAP Inference for Bayesian Inverse Reinforcement Learning , 2011, NIPS.

[111] Dongheui Lee,et al. Incremental kinesthetic teaching of motion primitives using the motion refinement tube , 2011, Auton. Robots.

[112] Christopher G. Atkeson,et al. Optimization and learning for rough terrain legged locomotion , 2011, Int. J. Robotics Res..

[113] Jan Peters,et al. Model learning for robot control: a survey , 2011, Cognitive Processing.

[114] Jan Peters,et al. Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[115] Darwin G. Caldwell,et al. Encoding the time and space constraints of a task in explicit-duration Hidden Markov Model , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[116] Martial Hebert,et al. Activity Forecasting , 2012, ECCV.

[117] Oliver Kroemer,et al. Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[118] Aude Billard,et al. Coupled dynamical system based arm-hand grasping model for learning fast adaptation strategies , 2012, Robotics Auton. Syst..

[119] David Silver,et al. Active learning from demonstration for robust autonomous navigation , 2012, 2012 IEEE International Conference on Robotics and Automation.

[120] R. Serfozo. Basics of Applied Stochastic Processes , 2012 .

[121] Scott Kuindersma,et al. Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[122] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[123] David Silver,et al. Learning Autonomous Driving Styles and Maneuvers from Expert Demonstration , 2012, ISER.

[124] Oliver Kroemer,et al. Structured Apprenticeship Learning , 2012, ECML/PKDD.

[125] Sergey Levine,et al. Continuous Inverse Optimal Control with Locally Optimal Examples , 2012, ICML.

[126] Martial Hebert,et al. Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[127] Stefan Schaal,et al. Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[128] Jan Peters,et al. Probabilistic Movement Primitives , 2013, NIPS.

[129] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[130] Pieter Abbeel,et al. Learning from Demonstrations Through the Use of Non-rigid Registration , 2013, ISRR.

[131] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[132] Erik B. Sudderth. Introduction to statistical machine learning , 2016 .

[133] P. Olver. Nonlinear Systems , 2013 .

[134] Jun Nakanishi,et al. Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[135] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[136] Anind K. Dey,et al. The Principle of Maximum Causal Entropy for Estimating Interacting Processes , 2013, IEEE Transactions on Information Theory.

[137] Siddhartha S. Srinivasa,et al. CHOMP: Covariant Hamiltonian optimization for motion planning , 2013, Int. J. Robotics Res..

[138] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[139] N. Bambos,et al. Infinite time horizon maximum causal entropy inverse reinforcement learning , 2014, 53rd IEEE Conference on Decision and Control.

[140] Leslie Pack Kaelbling,et al. Constructing Symbolic Representations for High-Level Planning , 2014, AAAI.

[141] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[142] Prashant Doshi,et al. Multi-robot inverse reinforcement learning under occlusion with interactions , 2014, AAMAS.

[143] Andrej Gams,et al. Coupling Movement Primitives: Interaction With the Environment and Bimanual Tasks , 2014, IEEE Transactions on Robotics.

[144] Katsushi Ikeuchi,et al. Toward a Dancing Robot With Listening Capability: Keypose-Based Integration of Lower-, Middle-, and Upper-Body Motions for Varying Music Tempos , 2014, IEEE Transactions on Robotics.

[145] Oliver Kroemer,et al. Interaction primitives for human-robot cooperation tasks , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[146] Mamoru Mitsuishi,et al. Online Trajectory Planning in Dynamic Environments for Surgical Task Automation , 2014, Robotics: Science and Systems.

[147] Alessandro Saffiotti,et al. Efficiently combining task and motion planning using geometric constraints , 2014, Int. J. Robotics Res..

[148] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .

[149] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[150] Aude Billard,et al. Catching Objects in Flight , 2014, IEEE Transactions on Robotics.

[151] J. Andrew Bagnell,et al. Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.

[152] Aude Billard,et al. Learning control Lyapunov function to ensure stability of dynamical system-based robot reaching motions , 2014, Robotics Auton. Syst..

[153] Oliver Kroemer,et al. Learning to predict phases of manipulation tasks as hidden states , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[154] Aude Billard,et al. Learning robotic eye–arm–hand coordination from human demonstration: a coupled dynamical systems approach , 2014, Biological Cybernetics.

[155] Peter Englert,et al. Multi-task policy search for robotics , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[156] Yoshua Bengio,et al. A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[157] Oliver Kroemer,et al. Active reward learning with a novel acquisition function , 2015, Auton. Robots.

[158] Masashi Sugiyama,et al. Conditional Density Estimation with Dimensionality Reduction via Squared-Loss Conditional Entropy Minimization , 2015, Neural Computation.

[159] J A Bagnell,et al. An Invitation to Imitation , 2015 .

[160] Sandy H. Huang,et al. Leveraging appearance priors in non-rigid registration, with application to manipulation of deformable objects , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[161] Kee-Eung Kim,et al. Hierarchical Bayesian Inverse Reinforcement Learning , 2015, IEEE Transactions on Cybernetics.

[162] David Vandyke,et al. Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[163] Jochen J. Steil,et al. Open-source benchmarking for learned reaching motion generation in robotics , 2015, Paladyn J. Behav. Robotics.

[164] Aude Billard,et al. Incremental motion learning with locally modulated dynamical systems , 2015, Robotics Auton. Syst..

[165] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[166] Carl E. Rasmussen,et al. Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[167] Jan Peters,et al. Learning movement primitive attractor goals and sequential skills from kinesthetic demonstrations , 2015, Robotics Auton. Syst..

[168] Pieter Abbeel,et al. A non-rigid point and normal registration algorithm with applications to learning from demonstrations , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[169] Scott Niekum,et al. Learning grounded finite-state representations from unstructured demonstrations , 2015, Int. J. Robotics Res..

[170] Siddhartha S. Srinivasa,et al. Movement primitives via optimization , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[171] Marc Toussaint,et al. Direct Loss Minimization Inverse Optimal Control , 2015, Robotics: Science and Systems.

[172] Jan Peters,et al. Learning multiple collaborative tasks with a mixture of Interaction Primitives , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[173] Yoshihiko Nakamura,et al. Statistical mutual conversion between whole body motion primitives and linguistic sentences for human motions , 2015, Int. J. Robotics Res..

[174] Sergey Levine,et al. Learning force-based manipulation of deformable objects from multiple demonstrations , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[175] David Pfau,et al. Bayesian Nonparametric Methods for Partially-Observable Reinforcement Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[176] Volodymyr Kuleshov,et al. Inverse Game Theory: Learning Utilities in Succinct Games , 2015, WINE.

[177] Thorsten Joachims,et al. Learning preferences for manipulation tasks from online coactive feedback , 2015, Int. J. Robotics Res..

[178] Masashi Sugiyama,et al. Statistical Reinforcement Learning - Modern Machine Learning Approaches , 2015, Chapman and Hall / CRC machine learning and pattern recognition series.

[179] Martial Hebert,et al. Improving Multi-Step Prediction of Learned Time Series Models , 2015, AAAI.

[180] Oliver Kroemer,et al. Towards learning hierarchical skills for multi-phase manipulation tasks , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[181] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[182] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[183] John Langford,et al. Learning to Search Better than Your Teacher , 2015, ICML.

[184] Prashant Doshi,et al. Toward Estimating Others' Transition Models Under Occlusion for Multi-Robot IRL , 2015, IJCAI.

[185] Sylvain Calinon,et al. Robot Learning with Task-Parameterized Generative Models , 2015, ISRR.

[186] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[187] Darwin G. Caldwell,et al. Learning Controllers for Reactive and Proactive Behaviors in Human–Robot Collaboration , 2016, Front. Robot. AI.

[188] Stefano Ermon,et al. Model-Free Imitation Learning with Policy Optimization , 2016, ICML.

[189] Anca D. Dragan,et al. Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[190] Andrej Gams,et al. Learning Compliant Movement Primitives Through Demonstration and Statistical Generalization , 2016, IEEE/ASME Transactions on Mechatronics.

[191] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[192] Jan Peters,et al. Incremental imitation learning of context-dependent motor skills , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[193] Hany Abdulsamad,et al. Optimal control and inverse optimal control by distribution matching , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[194] Martial Hebert,et al. Improved Learning of Dynamics Models for Control , 2016, ISER.

[195] Yoshihiko Nakamura,et al. Real-time Unsupervised Segmentation of human whole-body motion and its application to humanoid robot acquisition of motion symbols , 2016, Robotics Auton. Syst..

[196] Sergey Levine,et al. A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models , 2016, ArXiv.

[197] Shimon Whiteson,et al. Inverse Reinforcement Learning from Failure , 2016, AAMAS.

[198] Sylvain Calinon,et al. A tutorial on task-parameterized movement learning and retrieval , 2016, Intell. Serv. Robotics.

[199] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[200] Alberto Montebelli,et al. Learning in-contact control strategies from demonstration , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[201] Dana Kulic,et al. Expectation-Maximization for Inverse Reinforcement Learning with Hidden Data , 2016, AAMAS.

[202] Sanjay Krishnan,et al. HIRL: Hierarchical Inverse Reinforcement Learning for Long-Horizon Tasks with Delayed Rewards , 2016, ArXiv.

[203] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[204] Sebastian Nowozin,et al. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[205] Pieter Abbeel,et al. Third-Person Imitation Learning , 2017, ICLR.

[206] Sergey Levine,et al. One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[207] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[208] Oliver Kroemer,et al. Probabilistic movement primitives for coordination of multiple human–robot collaborative tasks , 2017, Auton. Robots.

[209] Marcin Andrychowicz,et al. One-Shot Imitation Learning , 2017, NIPS.

[210] Jitendra Malik,et al. Combining self-supervised learning and imitation for vision-based rope manipulation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[211] Sergey Levine,et al. Unsupervised Perceptual Rewards for Imitation Learning , 2016, Robotics: Science and Systems.

[212] Byron Boots,et al. Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.

[213] Shie Mannor,et al. End-to-End Differentiable Adversarial Imitation Learning , 2017, ICML.

[214] Jan Peters,et al. Guiding Trajectory Optimization by Demonstrated Distributions , 2017, IEEE Robotics and Automation Letters.

[215] Sylvain Calinon,et al. Supervisory teleoperation with online learning and optimal control , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[216] Jonathan Lee,et al. Iterative Noise Injection for Scalable Imitation Learning , 2017, ArXiv.

[217] Stefan Schaal,et al. Learning from Demonstration , 1996, NIPS.

[218] Léon Bottou,et al. Wasserstein Generative Adversarial Networks , 2017, ICML.

[219] Sergey Levine,et al. Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning , 2017, ICLR.

[220] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[221] Yoshihiko Nakamura,et al. Planning of goal-oriented motion from stochastic motion primitives and optimal controlling of joint torques in whole-body , 2017, Robotics Auton. Syst..

[222] Jan Peters,et al. Active Incremental Learning of Robot Movement Primitives , 2017, CoRL.

[223] Jan Peters,et al. Learning movement primitive libraries through probabilistic segmentation , 2017, Int. J. Robotics Res..

[224] Mamoru Mitsuishi,et al. Online Trajectory Planning and Force Control for Automation of Surgical Tasks , 2018, IEEE Transactions on Automation Science and Engineering.

[225] Sergey Levine,et al. Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[226] Joelle Pineau,et al. OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning , 2017, AAAI.

[227] Rouhollah Rahmatizadeh,et al. Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-to-End Learning from Demonstration , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).