The Sound of an Album Cover: Probabilistic Multimedia and IR

We present a novel, flexible, statistical approach to modeling music, images and text jointly. The technique is based on multi-modal mixture models and efficient computation using online EM. The learned models can be used to browse multimedia databases, to query on a multimedia database using any combination of music, images and text (lyrics and other contextual information), to annotate documents with music and images, and to find documents in a database similar to input text, music and/or graphics files.

[1]  Nando de Freitas,et al.  "Name That Song!" A Probabilistic Approach to Querying on Music and Text , 2002, NIPS.

[2]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[3]  A. F. M. SMTHt A Quasi-Bayes Sequential Procedure for Mixtures , 1978 .

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Jeremy Pickens A Comparison of Language Modeling and Probabilistic Text Information Retrieval Approaches to Monophonic Music Retrieval , 2000, ISMIR.

[6]  Holger H. Hoos,et al.  GUIDO/MIR - an Experimental Musical Information Retrieval System based on GUIDO Music Notation , 2001, ISMIR.

[7]  Masa-aki Sato Fast Learning of On-line EM Algorithm , 1999 .

[8]  Shin Ishii,et al.  On-line EM Algorithm for the Normalized Gaussian Network , 2000, Neural Computation.

[9]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[10]  É. Moulines,et al.  Convergence of a stochastic approximation version of the EM algorithm , 1999 .

[11]  J. Stephen Downie,et al.  Evaluating a simple approach to music information retrieval : conceiving melodic n-grams as text , 1999 .

[12]  David A. Forsyth,et al.  Clustering art , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[13]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[14]  Richard A. Harshman,et al.  Indexing by latent semantic indexing , 1990 .