Robot Weightlifting By Direct Policy Search

This paper describes a method for structuring a robot motor learning task. By designing a suitably parameterized policy, we show that a simple search algorithm, along with biologically motivated constraints, offers an effective means for motor skill acquisition. The framework makes use of the robot counterparts to several elements found in human motor learning: imitation, equilibrium-point control, motor programs, and synergies. We demonstrate that through learning, coordinated behavior emerges from initial, crude knowledge about a difficult robot weightlifting task.

[1]  N. A. Bernshteĭn The co-ordination and regulation of movements , 1967 .

[2]  Peter H. Greene,et al.  Problems of Organization of Motor Systems , 1972 .

[3]  D. Winfield,et al.  Optimization: Theory and practice , 1972 .

[4]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[5]  R. Schmidt A schema theory of discrete motor skill learning. , 1975 .

[6]  David E. Orin,et al.  Efficient Dynamic Computer Simulation of Robotic Mechanisms , 1982 .

[7]  P. H. Greene,et al.  Why is it easy to control your arms ? , 1982, Journal of motor behavior.

[8]  A. G. Feldman Once More on the Equilibrium-Point Hypothesis (λ Model) for Motor Control , 1986 .

[9]  S. Shankar Sastry,et al.  Adaptive Control of Mechanical Manipulators , 1987 .

[10]  Russell W. Anderson Biased Random-Walk Learning: A Neurobiological Correlate to Trial-and-Error , 1993, adap-org/9305002.

[11]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[12]  Peter L. Rogers,et al.  Measurement and Control , 1993 .

[13]  G. J. van Ingen Schenau,et al.  The control of multi-joint movements relies on detailed internal representations , 1995 .

[14]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[15]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[16]  Jun Morimoto,et al.  Conference on Intelligent Robots and Systems Reinforcement Le,arning of Dynamic Motor Sequence: Learning to Stand Up , 2022 .

[17]  Andrew W. Moore,et al.  Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[18]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[19]  James E. Bobrow,et al.  Weight lifting motion planning for a Puma 762 robot , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[20]  John J. Grefenstette,et al.  Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..

[21]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[22]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[23]  Charles W. Anderson,et al.  Approximating a Policy Can be Easier Than Approximating a Value Function , 2000 .

[24]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[25]  Peter L. Bartlett,et al.  Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.

[26]  S. Grossberg,et al.  Psychological Review , 2003 .