论文信息 - Music Similarity Estimation with the Mean-Covariance Restricted Boltzmann Machine

Music Similarity Estimation with the Mean-Covariance Restricted Boltzmann Machine

Existing content-based music similarity estimation methods largely build on complex hand-crafted feature extractors, which are difficult to engineer. As an alternative, unsupervised machine learning allows to learn features empirically from data. We train a recently proposed model, the mean-covariance Restricted Boltzmann Machine, on music spectrogram excerpts and employ it for music similarity estimation. In k-NN based genre retrieval experiments on three datasets, it clearly outperforms MFCC-based methods, beats simple unsupervised feature extraction using k-Means and comes close to the state-of-the-art. This shows that unsupervised feature extraction poses a viable alternative to engineered features.

Christian Osendorfer | Jan Schlüter

[1] Geoffrey E. Hinton,et al. Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images , 2010, AISTATS.

[2] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[3] Thomas Fillon,et al. YAAFE, an Easy to Use and Efficient Audio Feature Extraction Software , 2010, ISMIR.

[4] Andrew Y. Ng,et al. The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[5] Geoffrey E. Hinton,et al. Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine , 2010, NIPS.

[6] Elias Pampalk,et al. Content-based organization and visualization of music archives , 2002, MULTIMEDIA '02.

[7] Klaus Seyerlehner,et al. FRAME LEVEL AUDIO SIMILARITY - A CODEBOOK APPROACH , 2008 .

[8] Daniel P. W. Ellis,et al. Song-Level Features and Support Vector Machines for Music Classification , 2005, ISMIR.

[9] Geoffrey E. Hinton,et al. Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10] Honglak Lee,et al. Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[11] Beth Logan,et al. Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.