A Discrete Probabilistic Memory Model for Discovering Dependencies in Time

Many domains of machine learning involve discovering dependencies and structure over time. In the most complex of domains, long-term temporal dependencies are present. Neural network models suchas lstm have been developed to deal with long-term dependencies, but the continuous nature of neural networks is not well suited to discrete symbol processing tasks. Further, the mathematical underpinnings of neural networks are unclear, and gradient descent learning of recurrent neural networks seems particularly susceptible to local optima. We introduce a novel architecture for discovering dependencies in time. The architecture is formed by combining two variants of a hidden Markov model (hmm) - the factorial hmm and the input-output hmm - and adding a further strong constraint that requires the model to behave as a latch-and-store memory (the same constraint exploited in lstm). This model, called an miofhmm, can learn structure that other variants of the hmm cannot, and can generalize better than lstm on test sequences that have different statistical properties (different lengths, different types of noise) than training sequences. However, the miofhmm is slower to train and is more susceptible to local optima than LSTM.

[1]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[2]  Yoshua Bengio,et al.  Diffusion of Context and Credit Information in Markovian Models , 1995, J. Artif. Intell. Res..

[3]  M. Mozer,et al.  The Neural Network House: An Overview , 1995 .

[4]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[5]  M. Mozer,et al.  Explaining object-based deficits in unilateral neglect without object-based frames of reference. , 1999, Progress in brain research.

[6]  Peter M. Todd,et al.  Connectionist Music Composition Based on Melodic, Stylistic, and Psychophysical Constraints , 2003 .

[7]  M. C. Mozer,et al.  Object Recognition: Theories , 2001 .

[8]  Rugg Cognitive Neuroscience Society Symposium, San Francisco , 1998 .

[9]  Rugg McDonnell Summer Institute in Cognitive Neuroscience , 1998 .

[10]  Michael C. Mozer,et al.  Connectionist modeling and control of finite state systems given partial state information , 1995 .

[11]  Jürgen Schmidhuber,et al.  LSTM can Solve Hard Long Time Lag Problems , 1996, NIPS.

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[14]  Michael C. Mozer,et al.  Induction of Multiscale Temporal Structure , 1991, NIPS.

[15]  Michael C. Mozer,et al.  An architecture for experiential learning , 1986 .

[16]  K D Bagshawe,et al.  Seminar , 1961, European Business Law Review.

[17]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[18]  Lise Menn,et al.  Connectionist Modeling and the Microstructure of Phonological Development: A Progress Report , 1993 .

[19]  John F. Kolen,et al.  Field Guide to Dynamical Recurrent Networks , 2001 .

[20]  M. Mozer Attractor Networks , 2000 .

[21]  Michael C. Mozer,et al.  Computational modeling of spatial attention , 1996 .

[22]  Michael C. Mozer,et al.  Parsing the Stream of Time: The Value of Event-Based Segmentation in a Complex Real-World Control Problem , 1997, Summer School on Neural Networks.

[23]  Alan F. Murray,et al.  International Joint Conference on Neural Networks , 1993 .