Reducing the Ratio Between Learning Complexity and Number of Time Varying Variables in Fully Recurrent Nets

Let m be the number of time-varying variables for storing temporal events in a fully recurrent sequence processing network. Let Rtime be the ratio between the number of operations per time step (for an} exact gradient based supervised sequence learning algorithm), and m. Let Rspace be the ratio between the maximum number of storage cells necessary for learning arbitrary sequences, and m. With conventional recurrent nets, m equals the number of units. With the popular ‘real time recurrent learning algorithm’ (RTRL), Rtime = O(m3) and Rapace = O(m2). With ‘back-propagation through time’ (BPTT), Rtime = O(m) (much better than with RTRL) and Rspace is infinite (much worse than with RTRL). The contribution of this paper is a novel fully recurrent network and a corresponding exact gradient based learning algorithm with Rtime = O(m) (as good as with BPTT) and Rspace = O(m2) (as good as with RTRL)