Automatic State Abstraction from Demonstration

Learning from Demonstration (LfD) is a popular technique for building decision-making agents from human help. Traditional LfD methods use demonstrations as training examples for supervised learning, but complex tasks can require more examples than is practical to obtain. We present Abstraction from Demonstration (AfD), a novel form of LfD that uses demonstrations to infer state abstractions and reinforcement learning (RL) methods in those abstract state spaces to build a policy. Empirical results show that AfD is greater than an order of magnitude more sample efficient than just using demonstrations as training examples, and exponentially faster than RL alone.

[1]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[2]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[3]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[4]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[5]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[6]  José María Valls,et al.  Correcting and improving imitation models of humans for Robosoccer agents , 2005, 2005 IEEE Congress on Evolutionary Computation.

[7]  Rémi Gilleron,et al.  Learning from positive and unlabeled examples , 2000, Theor. Comput. Sci..

[8]  Peter Stone,et al.  State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.

[9]  Shie Mannor,et al.  Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.

[10]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[11]  Prasad Tadepalli,et al.  Automatic Induction of MAXQ Hierarchies , 2007 .

[12]  Lihong Li,et al.  Analyzing feature generation for value-function approximation , 2007, ICML '07.

[13]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[14]  Shie Mannor,et al.  Regularized Policy Iteration , 2008, NIPS.

[15]  Peng Zhou,et al.  Discovering options from example trajectories , 2009, ICML '09.

[16]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[17]  Andrew Y. Ng,et al.  Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[18]  Roberto Navigli,et al.  International Joint Conference on Artificial Intelligence (IJCAI) , 2011, IJCAI 2011.

[19]  Towards effective algorithms for linear groups , 2006 .