Reinforcement learning methods for multi-linked manipulator obstacle avoidance and control

This paper treats the multi-linked manipulator obstacle avoidance and control task as the interaction between a learning agent and an unknown environment. The role of the agent is to generate actions that maximises the reward that it receives from the environment. We demonstrate how two learning algorithms common in reinforcement learning literature- adaptive heuristic critic (AHC) (Barto et al., 1983), and Q-learning (Watkins, 1989)-can be used to solve the task successfully in two different ways: 1) through the generation of position commands to a PD controller which produces torque commands to drive the manipulator, and 2) through the direct generation of torque commands, removing the need for a PD controller. During the process, the inverse kinematics problem for multi-linked manipulators is automatically solved. Fast function approximation is achieved through the use of an array of cerebellar model arithmetic computers (CMAC). The generation of both discrete and continuous actions are investigated and the performance of the algorithms in terms of learning rates, efficiency of solutions, and memory requirements are evaluated.<<ETX>>