论文信息 - Algorithms for Learning Markov Field Policies

Algorithms for Learning Markov Field Policies

We use a graphical model for representing policies in Markov Decision Processes. This new representation can easily incorporate domain knowledge in the form of a state similarity graph that loosely indicates which states are supposed to have similar optimal actions. A bias is then introduced into the policy search process by sampling policies from a distribution that assigns high probabilities to policies that agree with the provided state similarity graph, i.e. smoother policies. This distribution corresponds to a Markov Random Field. We also present forward and inverse reinforcement learning algorithms for learning such policy distributions. We illustrate the advantage of the proposed approach on two problems: cart-balancing with swing-up, and teaching a robot to grasp unknown objects.

Oliver Kroemer | Jan Peters | Abdeslam Boularias

[1] Vladlen Koltun,et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[2] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[3] Ben Taskar,et al. Learning structured prediction models: a large margin approach , 2005, ICML.

[4] Olga Veksler,et al. Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[5] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.

[6] Miroslav Dudík,et al. Performance Guarantees for Regularized Maximum Entropy Density Estimation , 2004, COLT.

[7] Oliver Kroemer,et al. Structured Apprenticeship Learning , 2012, ECML/PKDD.

[8] David Silver,et al. Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[9] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[10] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[11] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.

[12] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[13] Martial Hebert,et al. Onboard contextual classification of 3-D point clouds with learned high-order Markov Random Fields , 2009, 2009 IEEE International Conference on Robotics and Automation.

[14] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.

[15] Oliver Kroemer,et al. Learning robot grasping from 3-D images with Markov Random Fields , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16] Bernhard Schölkopf,et al. A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[17] Pushmeet Kohli,et al. P3 & Beyond: Solving Energies with Higher Order Cliques , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.