Feature extraction by incremental parsing for music indexing

In this paper, we employ a linguistic-processing approach to the content-based retrieval of music information. Central to the approach is the use of a lossy version of the Lempel-Ziv incremental parsing (LZIP) algorithm, which constructs a dictionary by incrementally parsing music feature vectors. LZIP is adopted as a source characterization technique owing to it's universal-coding nature, and asymptotic convergence to the entropy of the source. The dictionary is composed of variable-length parsed representations, which are used to construct a highly sparse co-occurrence matrix, which counts the occurrence of the parsed representations in each music. As a feature analysis framework, Latent Semantic Analysis (LSA) is then applied to the co-occurrence matrix to generate a lower-dimensional approximation that exposes the most salient features of the represented audio documents. The aforementioned approach, in addition to adopting reduced sampling rates and quantized feature vectors, yields a system with reduced requirements in terms of processing and storage, and increases the tolerance to noisy queries. We demonstrate the performance of the system in the music genre classification problem, and analyze its robustness to perturbed queries. Moreover, we demonstrate that using the incremental parsing algorithm in forming the audio dictionary has superior retrieval performance compared to techniques yielding a dictionary with fixed-length entries such as vector quantization.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[3]  Michael A. Casey,et al.  The Importance of Sequences in Musical Similarity , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Eric Allamanche,et al.  Content-based Identification of Audio Material Using MPEG-7 Low Level Description , 2001, ISMIR.

[5]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[6]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[7]  Thierry Bertin-Mahieux,et al.  Automatic Generation of Social Tags for Music Recommendation , 2007, NIPS.

[8]  J.R. Bellegarda,et al.  Exploiting latent semantic information in statistical language modeling , 2000, Proceedings of the IEEE.

[9]  Marc Leman,et al.  Content-Based Music Information Retrieval: Current Directions and Future Challenges , 2008, Proceedings of the IEEE.

[10]  Chin-Hui Lee,et al.  On the importance of modeling temporal information in music tag annotation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.