Benchmarking Reinforcement Learning Algorithms on Tetherball Games

Robotic applications in the real world are faced with the stochasticity and uncertainty of the environment, which renders the approaches of having hand-tuned controllers impractical. Reinforcement learning offers a more flexible alternative due to its ability to adapt to the ever-changing environment dynamics, as opposed to rigid engineered approaches, where each scenario has to be accounted for separately. Policy search methods, a sub-field of reinforcement learning, provide a potentially more flexible alternative that is scalable to robotic applications by optimizing the robot’s policy parameters to the task at hand. In a real world scenario, the same task can have a multitude of settings or contexts, and instead of engineering a solution for each context, contextual policy search methods generalize their knowledge of the task across different contexts making the robot’s policy more versatile. This thesis reviews and benchmarks four contextual policy search algorithms and empirically evaluates their sample efficiency, scalability and performance on multidimensional benchmark functions and on a simulated robot tetherball task. Zusammenfassung Roboter sind einem großen Maß von Variabilität und Unsicherheit in der realen Welt ausgesetzt. Dies schränkt den Einsatz von klassischen, fein abgestimmten Lösungen ein, da es unmöglich ist alle Szenarien zu berücksichtigen. Reinforcement Learning bietet eine flexible Alternative, indem es Robotern die Fähigkeit verleiht aus Interaktionen zu lernen, und sich an eine dynamische Umgebung anzupassen. Policy Search Methoden bilden eine Kategorie im Bereich von Reinforcement Learning. Solche Methoden zeigen zurzeit großes Potential in der Robotik, indem sie einem Roboter ermöglichen viele Variationen einer Aufgabe in verschiedenen Kontexten gleichzeitig zu lernen, und auf unbekannte Situationen zu generalisieren. In dieser Thesis werden sogenannte Contextual Policy Search Methoden untersucht und im Hinblick auf Effizienz, Skalierbarkeit und Optimalität empirisch ausgewertet. Dabei werden sowohl verschiedene hochdeminsionale Testfunktionen als auch eine simulierte Tetherball-Aufgabe analysiert.

[1]  Homer,et al.  Iliad of Homer , 1844 .

[2]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[3]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[4]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[5]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[6]  Shie Mannor,et al.  The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Jürgen Schmidhuber,et al.  State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.

[9]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[10]  Tom Schaul,et al.  Efficient natural evolution strategies , 2009, GECCO.

[11]  Darwin G. Caldwell,et al.  Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Jan Peters,et al.  Learning table tennis with a Mixture of Motor Primitives , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[13]  Isao Ono,et al.  Theoretical Foundation for CMA-ES from Information Geometry Perspective , 2012, Algorithmica.

[14]  Olivier Sigaud,et al.  Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[15]  Anne Auger,et al.  Tutorial CMA-ES: evolution strategies and covariance matrix adaptation , 2012, Annual Conference on Genetic and Evolutionary Computation.

[16]  Jun Nakanishi,et al.  Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[17]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[18]  Jan Peters,et al.  Data-Efficient Generalization of Robot Skills with Contextual Policy Search , 2013, AAAI.

[19]  Hany Abdulsamad,et al.  Playing Tetherball with Compliant Robots , 2014 .

[20]  Luís Paulo Reis,et al.  Regularized covariance estimation for weighted maximum likelihood policy search methods , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[21]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[22]  Jan Peters,et al.  Reinforcement learning vs human programming in tetherball robot games , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Luís Paulo Reis,et al.  Model-Based Relative Entropy Stochastic Search , 2016, NIPS.

[24]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[25]  Luís Paulo Reis,et al.  Contextual Relative Entropy Policy Search with Covariance Matrix Adaptation , 2016, 2016 International Conference on Autonomous Robot Systems and Competitions (ICARSC).

[26]  Takahide Yoshiike,et al.  Development of experimental legged robot for inspection and disaster response in plants , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Anil A. Bharath,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[28]  Luís Paulo Reis,et al.  Deriving and improving CMA-ES with information geometric trust regions , 2017, GECCO.

[29]  Masashi Sugiyama,et al.  Policy Search with High-Dimensional Context Variables , 2016, AAAI.

[30]  Anne Auger,et al.  Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles , 2011, J. Mach. Learn. Res..

[31]  Nuno Lau,et al.  Contextual CMA-ES , 2017 .

[32]  Pieter Abbeel,et al.  An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[33]  Glen Berseth,et al.  Feedback Control For Cassie With Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).