EVALUATING LONG-TERM DEPENDENCYBENCHMARK PROBLEMS BY RANDOM GUESSINGJ

Numerous recent papers focus on standard recurrent nets' problems with tasks involving long-term dependencies. We solve such tasks by random weight guessing (RG). Although RG cannot be viewed as a reasonable learning algorithm we nd that it often outperforms previous, more complex methods on widely used benchmark problems. One reason for RG's success is that the solutions to many of these benchmarks are dense in weight space. An analysis of cases in which RG works well versus those in which it does not can serve to improve the quality of benchmarks for novel recurrent net algorithms.

[1]  Taylor L. Booth,et al.  Grammatical Inference: Introduction and Survey-Part I , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Stephen I. Gallant,et al.  A connectionist learning algorithm with provable generalization and scaling bounds , 1990, Neural Networks.

[3]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[4]  Michael C. Mozer,et al.  Induction of Multiscale Temporal Structure , 1991, NIPS.

[5]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[6]  Kevin J. Lang Random DFA's can be approximately learned from sparse uniform examples , 1992, COLT '92.

[7]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[8]  Yoshua Bengio,et al.  Credit Assignment through Time: Alternatives to Backpropagation , 1993, NIPS.

[9]  C. Lee Giles,et al.  Experimental Comparison of the Effect of Order in Recurrent Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[10]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[11]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[12]  Panagiotis Manolios,et al.  First-Order Recurrent Neural Networks and Deterministic Finite State Automata , 1994, Neural Computation.

[13]  Peter Tiňo,et al.  Learning long-term dependencies is not as difficult with NARX recurrent neural networks , 1995 .

[14]  Barak A. Pearlmutter Gradient calculations for dynamic recurrent neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[15]  Peter Tiño,et al.  Learning long-term dependencies is not as difficult with NARX networks , 1995, NIPS.

[16]  Yoshua Bengio,et al.  Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[17]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[18]  Michael I. Jordan,et al.  Hidden Markov Decision Trees , 1996, NIPS.

[19]  Jürgen Schmidhuber,et al.  LSTM can Solve Hard Long Time Lag Problems , 1996, NIPS.

[20]  Jürgen Schmidhuber,et al.  Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability , 1997, Neural Networks.

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  Jürgen Schmidhuber,et al.  Flat Minima , 1997, Neural Computation.