Learning to Align Polyphonic Music

We describe an efficient learning algorithm for aligning a symbolic representation of a musical piece with its acoustic counterpart. Our method employs a supervised learning approach by using a training set of aligned symbolic and acoustic representations. The alignment function we devise is based on mapping the input acousticsymbolic representation along with the target alignment into an abstract vector-space. Building on techniques used for learning support vector machines (SVM), our alignment function distills to a classifier in the abstract vectorspace which separates correct alignments from incorrect ones. We describe a simple iterative algorithm for learning the alignment function and discuss its formal properties. We use our method for aligning MIDI and MP3 representations of polyphonic recordings of piano music. We also compare our discriminative approach to a generative method based on a generalization of hidden Markov models. In all of our experiments, the discriminative method outperforms the HMM-based method.

[1]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[2]  Roger B. Dannenberg,et al.  An On-Line Algorithm for Real-Time Accompaniment , 1984, ICMC.

[3]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[4]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[5]  Harvey F. Silverman,et al.  Combining hidden Markov model and neural network classifiers , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[6]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[7]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[8]  Y. Censor,et al.  Parallel Optimization: Theory, Algorithms, and Applications , 1997 .

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  Christopher Raphael,et al.  Automatic Segmentation of Acoustic Musical Signals Using Hidden Markov Models , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Simon J. Godsill,et al.  Polyphonic pitch tracking using joint Bayesian estimation of multiple frame parameters , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[12]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[13]  Mark Herbster,et al.  Learning Additive Models Online with Fast Evaluating Kernels , 2001, COLT/EuroCOLT.

[14]  Anssi Klapuri,et al.  Automatic transcription of musical recordings , 2001 .

[15]  Adriane Durey,et al.  Melody Spotting Using Hidden Markov Models , 2001, ISMIR.

[16]  Shlomo Dubnov,et al.  Robust temporal and spectral modeling for query By melody , 2002, SIGIR '02.

[17]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[18]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[19]  Manfred K. Warmuth,et al.  Relating Data Compression and Learnability , 2003 .

[20]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[21]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[22]  Daniel P. W. Ellis,et al.  Ground-truth transcriptions of real music from force-aligned MIDI syntheses , 2003, ISMIR.

[23]  Xavier Rodet,et al.  Improving polyphonic and poly-instrumental music to score alignment , 2003, ISMIR.