Unsupervised Learning in Recurrent Neural Networks

While much work has been done on unsupervised learning in feedforward neural network architectures, its potential with (theoret- ically more powerful) recurrent networks and time-varying inputs has rarely been explored. Here we train Long Short-Term Memory (LSTM) recurrent networks to maximize two information-theoretic objectives for unsupervised learning: Binary Information Gain Optimization (BINGO) and Nonparametric Entropy Optimization (NEO). LSTM learns to dis- criminate dierent

[1]  N N Schraudolph,et al.  Processing images by semi-linear predictability minimization. , 1999, Network.

[2]  Z Ghahramani,et al.  Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[3]  Jurgen Schmidhuber,et al.  Discovering Predictable Classifications ; CU-CS-626-92 , 1992 .

[4]  Geoffrey E. Hinton,et al.  Developing Population Codes by Minimizing Description Length , 1993, NIPS.

[5]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[6]  Suzanna Becker,et al.  Unsupervised Learning Procedures for Neural Networks , 1991, Int. J. Neural Syst..

[7]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[8]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[9]  Paul A. Viola,et al.  Empirical Entropy Manipulation for Real-World Problems , 1995, NIPS.

[10]  Juergen Schmidhuber,et al.  Long Short-Term Memory Learns Context Free and Context Sensitive Languages , 2000 .

[11]  Shun-ichi Amari,et al.  Adaptive Online Learning Algorithms for Blind Separation: Maximum Entropy and Minimum Mutual Information , 1997, Neural Computation.

[12]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[13]  J. E. Moody,et al.  LEARNING UNAMBIGUOUS REDUCED SEQUENCEDESCRIPTIONSIn , 1992 .

[14]  Peter Földiák,et al.  Sparse coding in the primate cortex , 1998 .

[15]  Néstor Parga,et al.  Redundancy Reduction and Independent Component Analysis: Conditions on Cumulants and Adaptive Approaches , 1997, Neural Computation.

[16]  Horace Barlow,et al.  Understanding Natural Vision , 1983 .

[17]  R. Zemel A minimum description length framework for unsupervised learning , 1994 .

[18]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[19]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[20]  M. Mozer Discovering Discrete Distributed Representations with Iterative Competitive Learning , 1990, NIPS 1990.

[21]  S. Hochreiter,et al.  Lococode Performs Nonlinear ICA Without Knowing The Number Of Sources , 1999 .

[22]  Terrence J. Sejnowski,et al.  Unsupervised Discrimination of Clustered Data via Optimization of Binary Information Gain , 1992, NIPS.

[23]  R. Zemel,et al.  Competition and Multiple Cause Models , 1995, Neural Computation.

[24]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[25]  Zhaoping Li A Theory of the Visual Motion Coding in the Primary Visual Cortex , 1996, Neural Computation.

[26]  Jürgen Schmidhuber,et al.  Source Separation as a By-Product of Regularization , 1998, NIPS.

[27]  Jürgen Schmidhuber Neural Predictors for Detecting and Removing Redundant Information , 2000 .

[28]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[29]  Petteri Pajunen Blind Source Separation Of Natural Signals Based On Approximate Complexity Minimization , 1999 .

[30]  D J Field,et al.  Relations between the statistics of natural images and the response properties of cortical cells. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[31]  G. Cottrell Optimization of Entropy with Neural Networks , 1995 .

[32]  Peter Tifio Building predictive models on complex symbolic sequences with a second-order recurrent BCM network with lateral inhibition , 2000 .

[33]  Jürgen Schmidhuber Learning Unambiguous Reduced Sequence Descriptions , 1991, NIPS.

[34]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[35]  H. B. Barlow,et al.  Finding Minimum Entropy Codes , 1989, Neural Computation.

[36]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[37]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[38]  Eric Saund Unsupervised Learning of Mixtures of Multiple Causes in Binary Data , 1993, NIPS.

[39]  Jürgen Schmidhuber,et al.  Recurrent nets that time and count , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[40]  Jürgen Schmidhuber,et al.  Feature Extraction Through LOCOCODE , 1999, Neural Computation.

[41]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[42]  A. Norman Redlich,et al.  Redundancy Reduction as a Strategy for Unsupervised Learning , 1993, Neural Computation.

[43]  Günther Palm On the Information Storage Capacity of Local Learning Rules , 1992, Neural Computation.

[44]  Stefanie N. Lindstaedt,et al.  Comparison of two Unsupervised Neural Network Models for Redundancy Reduction , 1993 .

[45]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[46]  Jürgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.