论文信息 - Reducing the Ratio Between Learning Complexity and Number of Time Varying Variables in Fully Recurrent Nets

Reducing the Ratio Between Learning Complexity and Number of Time Varying Variables in Fully Recurrent Nets

Let m be the number of time-varying variables for storing temporal events in a fully recurrent sequence processing network. Let Rtime be the ratio between the number of operations per time step (for an} exact gradient based supervised sequence learning algorithm), and m. Let Rspace be the ratio between the maximum number of storage cells necessary for learning arbitrary sequences, and m. With conventional recurrent nets, m equals the number of units. With the popular ‘real time recurrent learning algorithm’ (RTRL), Rtime = O(m3) and Rapace = O(m2). With ‘back-propagation through time’ (BPTT), Rtime = O(m) (much better than with RTRL) and Rspace is infinite (much worse than with RTRL). The contribution of this paper is a novel fully recurrent network and a corresponding exact gradient based learning algorithm with Rtime = O(m) (as good as with BPTT) and Rspace = O(m2) (as good as with RTRL)

J. Schmidhuber

[1] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[2] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[3] M K Habib,et al. Dynamics of neuronal firing correlation: modulation of "effective connectivity". , 1989, Journal of neurophysiology.

[4] Jürgen Schmidhuber,et al. A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.

[5] Jürgen Schmidhuber,et al. Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.