论文信息 - Discriminative training of hidden Markov models for multiple pitch tracking [speech processing examples]

Discriminative training of hidden Markov models for multiple pitch tracking [speech processing examples]

We present a multiple pitch tracking algorithm that is based on direct probabilistic modeling of the spectrogram of the signal. The model is a factorial hidden Markov model whose parameters are learned discriminatively from the Keele pitch database. Our algorithm can track several pitches and determines the number of pitches that are active at any given time. We present simulation results on mixtures of several speech signals and noise, showing the robustness of our approach.

Michael I. Jordan | Francis R. Bach | F. Bach

[1] Lalit R. Bahl,et al. Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2] Thomas F. Quatieri,et al. Pitch estimation and voicing detection based on a sinusoidal speech model , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[3] G. Wahba. Spline models for observational data , 1990 .

[4] Albert S. Bregman,et al. The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[5] Grace Wahba,et al. Spline Models for Observational Data , 1990 .

[6] Fabrice Plante,et al. A pitch extraction reference database , 1995, EUROSPEECH.

[7] Michael I. Jordan. Graphical Models , 1998 .

[8] Simon J. Godsill,et al. Polyphonic pitch tracking using joint Bayesian estimation of multiple frame parameters , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[9] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10] Guy J. Brown,et al. A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11] Bin Yu,et al. Maximum pseudo likelihood estimation in network tomography , 2003, IEEE Trans. Signal Process..

[12] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 2001 .

[13] Xiao Li,et al. Graphical model approach to pitch tracking , 2004, INTERSPEECH.

[14] Scott Rickard,et al. Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[15] Shlomo Dubnov,et al. Maximum a-posteriori probability pitch tracking in noisy environments using harmonic model , 2004, IEEE Transactions on Speech and Audio Processing.