Universal Onset Detection with Bidirectional Long Short-Term Memory Neural Networks

Many different onset detection methods have been proposed in recent years. However those that perform well tend to be highly specialised for certain types of music, while those that are more widely applicable give only moderate performance. In this paper we present a new onset detector with superior performance and temporal precision for all kinds of music, including complex music mixes. It is based on auditory spectral features and relative spectral differences processed by a bidirectional Long Short-Term Memory recurrent neural network, which acts as reduction function. The network is trained with a large database of onset data covering various genres and onset types. Due to the data driven nature, our approach does not require the onset detection method and its parameters to be tuned to a particular type of music. We compare results on the Bello onset data set and can conclude that our approach is on par with related results on the same set and outperforms them in most cases in terms of F1-measure. For complex music with mixed onset types, an absolute improvement of 3.6% is reported.

[1]  M. Davies,et al.  Complex domain onset detection for musical signals , 2003 .

[2]  Alexandre Lacoste ONSET DETECTION WITH ARTIFICIAL NEURAL NETWORKS FOR MIREX 2005 , .

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[5]  J. Reiss,et al.  ONSET DETECTION COMBINING ENERGY-BASED AND PITCH-BASED APPROACHES , 2007 .

[6]  Anssi Klapuri,et al.  Sound onset detection by applying psychoacoustic knowledge , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[7]  Nick Collins Using a Pitch Detector for Onset Detection , 2005, ISMIR.

[8]  Mark B. Sandler,et al.  A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[9]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[10]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[11]  Michèle Basseville,et al.  Detection of abrupt changes: theory and application , 1993 .

[12]  Nick Collins A Comparison of Sound Onset Detection Algorithms with Emphasis on Psychoacoustically Motivated Detection Functions , 2005 .

[13]  A. Röbel ONSET DETECTION BY MEANS OF TRANSIENT PEAK CLASSIFICATION , 2009 .

[14]  S. Handel Listening As Introduction to the Perception of Auditory Events , 1989 .

[15]  S. Dixon ONSET DETECTION REVISITED , 2006 .

[16]  George Tzanetakis,et al.  An experimental comparison of audio tempo induction algorithms , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Björn W. Schuller,et al.  Wearable Assistance for the Ballroom-Dance Hobbyist - Holistic Rhythm Analysis and Dance-Style Classification , 2007, 2007 IEEE International Conference on Multimedia and Expo.