First-Order Gradient Descent Training of Adaptive Discrete-Time Dynamic Networks

Abstract : This paper describes the training of discrete-time dynamic systems with adaptive parameters (recurrent neural networks) using first-order gradient descent algorithms. To facilitate the explanation of these algorithms, a standard representation of a discrete-time dynamic system is defined. Any differentiable discrete dynamic system may be put in this standard representation and trained using a gradient descent algorithm. Using the standard representation, we described two general types of learning algorithms. The first is based upon the discrete-time Euler-Lagrange equations, and the second is based upon a recursive update of the output gradients. Both the epochwise and on-line versions of these algorithms are presented. When the dynamic system is implemented by a neural network, the epochwise algorithm based on the Euler-Lagrange equations is equivalent to backpropagation-through-time and the on-line method based on the recursive equation is the same as recursive backpropagation. It is shown that the epochwise versions of the algorithms are equivalent. The two on-line versions of the algorithms are shown to be approximately equivalent.