Moving Beyond Feature Design: Deep Architectures and Automatic Feature Learning in Music Informatics

The short history of content-based music informatics research is dominated by hand-crafted feature design, and our community has grown admittedly complacent with a few de facto standards. Despite commendable progress in many areas, it is increasingly apparent that our efforts are yielding diminishing returns. This deceleration is largely due to the tandem of heuristic feature design and shallow processing architectures. We systematically discard hopefully irrelevant information while simultaneously calling upon creativity, intuition, or sheer luck to craft useful representations, gradually evolving complex, carefully tuned systems to address specific tasks. While other disciplines have seen the benefits of deep learning, it has only recently started to be explored in our field. By reviewing deep architectures and feature learning, we hope to raise awareness in our community about alternative approaches to solving MIR challenges, new and old alike.

[1]  J. Grey Multidimensional perceptual scaling of musical timbres. , 1977, The Journal of the Acoustical Society of America.

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Eric D. Scheirer,et al.  Tempo and beat analysis of acoustic musical signals. , 1998, The Journal of the Acoustical Society of America.

[4]  François Pachet,et al.  Music Similarity Measures: What's the use? , 2002, ISMIR.

[5]  François Pachet,et al.  Automatic extraction of music descriptors from acoustic signals , 2004, ISMIR.

[6]  François Pachet,et al.  Recognizing Chords with EDS: Part One , 2005, CMMR.

[7]  Mark B. Sandler,et al.  A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[8]  Gaël Richard,et al.  Musical instrument recognition by pairwise classification strategies , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Anssi Klapuri,et al.  Signal Processing Methods for Music Transcription , 2006 .

[10]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[11]  Jaakko Astola,et al.  Analysis of the meter of acoustic musical signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Douglas Eck,et al.  A Supervised Classification Algorithm for Note Onset Detection , 2006, EURASIP J. Adv. Signal Process..

[13]  Bill Buxton,et al.  Sketching User Experiences: Getting the Design Right and the Right Design , 2007 .

[14]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[15]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[16]  Marc Leman,et al.  Content-Based Music Information Retrieval: Current Directions and Future Challenges , 2008, Proceedings of the IEEE.

[17]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[18]  Douglas Eck,et al.  Automatic Identification of Instrument Classes in Polyphonic and Poly-Instrument Audio , 2009, ISMIR.

[19]  Alexandros Nanopoulos,et al.  Looking Through the "Glass Ceiling": A Conceptual Framework for the Problems of Spectral Similarity , 2010, ISMIR.

[20]  Ron J. Weiss,et al.  Exploring common variations in state of the art chord recognition systems , 2010 .

[21]  Douglas Eck,et al.  Learning Features from Music Audio with Deep Belief Networks , 2010, ISMIR.

[22]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[23]  Antoni B. Chan,et al.  Automatic Musical Pattern Feature Extraction Using Convolutional Neural Network , 2010 .

[24]  Yann LeCun,et al.  Unsupervised Learning of Sparse Features for Scalable Audio Classification , 2011, ISMIR.

[25]  Joakim Andén,et al.  Multiscale Scattering for Audio Classification , 2011, ISMIR.

[26]  Juan Pablo Bello,et al.  Non-Linear Semantic Embedding for Organizing Large Instrument Sample Libraries , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[27]  Juhan Nam,et al.  A Classification-Based Polyphonic Piano Transcription Approach Using Learned Feature Representations , 2011, ISMIR.

[28]  Benjamin Schrauwen,et al.  Audio-based Music Classification with a Pretrained Convolutional Network , 2011, ISMIR.

[29]  Peter Grosche,et al.  Extracting Predominant Local Pulse Information From Music Recordings , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Daniel P. W. Ellis,et al.  Signal Processing for Music Analysis , 2011, IEEE Journal of Selected Topics in Signal Processing.

[31]  Juan Pablo Bello,et al.  Learning a robust Tonnetz-space transform for automatic chord recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).