Music Similarity Estimation with the Mean-Covariance Restricted Boltzmann Machine

Existing content-based music similarity estimation methods largely build on complex hand-crafted feature extractors, which are difficult to engineer. As an alternative, unsupervised machine learning allows to learn features empirically from data. We train a recently proposed model, the mean-covariance Restricted Boltzmann Machine, on music spectrogram excerpts and employ it for music similarity estimation. In k-NN based genre retrieval experiments on three datasets, it clearly outperforms MFCC-based methods, beats simple unsupervised feature extraction using k-Means and comes close to the state-of-the-art. This shows that unsupervised feature extraction poses a viable alternative to engineered features.

[1]  Geoffrey E. Hinton,et al.  Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images , 2010, AISTATS.

[2]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[3]  Thomas Fillon,et al.  YAAFE, an Easy to Use and Efficient Audio Feature Extraction Software , 2010, ISMIR.

[4]  Andrew Y. Ng,et al.  The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[5]  Geoffrey E. Hinton,et al.  Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine , 2010, NIPS.

[6]  Elias Pampalk,et al.  Content-based organization and visualization of music archives , 2002, MULTIMEDIA '02.

[7]  Klaus Seyerlehner,et al.  FRAME LEVEL AUDIO SIMILARITY - A CODEBOOK APPROACH , 2008 .

[8]  Daniel P. W. Ellis,et al.  Song-Level Features and Support Vector Machines for Music Classification , 2005, ISMIR.

[9]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[11]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[12]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[13]  Arthur Flexer,et al.  Effects of Album and Artist Filters in Audio Similarity Computed for Very Large Music Databases , 2010, Computer Music Journal.

[14]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[15]  Douglas Eck,et al.  Learning Features from Music Audio with Deep Belief Networks , 2010, ISMIR.

[16]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[17]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[18]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[19]  Volodymyr Mnih,et al.  CUDAMat: a CUDA-based matrix class for Python , 2009 .

[20]  Klaus Seyerlehner FUSING BLOCK-LEVEL FEATURES FOR MUSIC SIMILARITY ESTIMATION , 2010 .

[21]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[22]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[23]  David M. Blei,et al.  FINDING LATENT SOURCES IN RECORDED MUSIC WITH A SHIFT-INVARIANT HDP , 2009 .

[24]  Peter Knees,et al.  On Rhythm and General Music Similarity , 2009, ISMIR.

[25]  Gerhard Widmer,et al.  Evaluating Rhythmic descriptors for Musical Genre Classification , 2004 .

[26]  Peter Knees,et al.  Independent Component Analysis for Music Similarity Computation , 2006, ISMIR.

[27]  Samer A. Abdallah,et al.  Towards music perception by redundancy reduction and unsupervised learning in probabilistic models , 2002 .

[28]  Katharina Morik,et al.  A Benchmark Dataset for Audio Classification and Clustering , 2005, ISMIR.

[29]  Kilian Q. Weinberger,et al.  ISMIR 2008 – Session 3a – Content-Based Retrieval, Categorization and Similarity 1 LEARNING A METRIC FOR MUSIC SIMILARITY , 2022 .

[30]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[31]  Perry R. Cook,et al.  Content-Based Musical Similarity Computation using the Hierarchical Dirichlet Process , 2008, ISMIR.

[32]  Emilios Cambouropoulos,et al.  Towards a General Computational Theory of Musical Structure , 1998 .