learn continuous control with deep bestärkendem

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, to select actions for the train which is used a neural actor network, that are carried out by an agent that interacts with an environment. One of the methods involves obtaining a mini-stack of Erfahrungstupeln; and updating the current values ​​of parameters of the neural actor network include: a neural processing the training observation and the training action in Erfahrungstupel to determine a neural network output for the Erfahrungstupel using a critical neural network, and determining: for each Erfahrungstupel in the mini-stack destination network output for the Erfahrungstupel; Updating current values ​​of parameters of the critical neural network using errors between the neural target network spending and the neural network outputs; and updating the current values ​​of the parameters of the neural actor-network using the critical neural network.