Finding temporal structure in music: blues improvisation with LSTM recurrent networks

We consider the problem of extracting essential ingredients of music signals, such as a well-defined global temporal structure in the form of nested periodicities (or meter). We investigate whether we can construct an adaptive signal processing device that learns by example how to generate new instances of a given musical style. Because recurrent neural networks (RNNs) can, in principle, learn the temporal structure of a signal, they are good candidates for such a task. Unfortunately, music composed by standard RNNs often lacks global coherence. The reason for this failure seems to be that RNNs cannot keep track of temporally distant events that indicate global music structure. Long short-term memory (LSTM) has succeeded in similar domains where other RNNs have failed, such as timing and counting and the learning of context sensitive languages. We show that LSTM is also a good mechanism for learning to compose music. We present experimental results showing that LSTM successfully learns a form of blues music and is able to compose novel (and we believe pleasing) melodies in that style. Remarkably, once the network has found the relevant structure, it does not drift from it: LSTM is able to play the blues with good timing and proper structure as long as one is willing to listen.

[1]  Grosvenor W. Cooper,et al.  The Rhythmic Structure of Music , 1971 .

[2]  R. Shepard Geometrical approximations to the structure of musical pitch. , 1982, Psychological review.

[3]  Geoffrey E. Hinton,et al.  Experiments on Learning by Back Propagation. , 1986 .

[4]  Peter M. Todd,et al.  Modeling the Perception of Tonal Structure with Neural Nets , 1989 .

[5]  Douglas H. Keefe,et al.  The Representation of Pitch in a Neural Net Model of Chord Classification , 1989 .

[6]  Peter M. Todd,et al.  A Connectionist Approach To Algorithmic Composition , 1989 .

[7]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[8]  Michael C. Mozer,et al.  Neural Network Music Composition by Prediction: Exploring the Benefits of Psychoacoustic Constraints and Multi-scale Processing , 1994, Connect. Sci..

[9]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[10]  Wolfram Schiffmann,et al.  Speeding Up Backpropagation Algorithms by Using Cross-Entropy Combined with Pattern Normalization , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[11]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[12]  Jürgen Schmidhuber,et al.  Recurrent nets that time and count , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[13]  Douglas Eck A Network of Relaxation Oscillators that Finds Downbeats in Rhythms , 2001, ICANN.

[14]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[15]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[16]  Douglas Eck Finding downbeats with a relaxation oscillator , 2002, Psychological research.

[17]  Jürgen Schmidhuber,et al.  Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets , 2003, Neural Networks.