Modèles de Markov cachés pour l'estimation de plusieurs fréquences fondamentales

Resume. Un algorithme d’estimation de la frequence fondamentale de signaux sonores est introduit: il utilise une modelisation du spectrogramme du signal a l’aide d’un modele de Markov cache factoriel, dont les parametres sont estimes de maniere discriminative a partir de la base de donnees de Keele (Plante et al., 1995). Les algorithmes presentes permettent de suivre plusieurs frequences fondamentales et de determiner le nombre de frequences presentes a chaque instant. Les resultats de simulations, effectuees sur des melanges de signaux de parole et du bruit, illustrent la robustesse de l’approche presentee.

[1]  Bin Yu,et al.  Maximum pseudo likelihood estimation in network tomography , 2003, IEEE Trans. Signal Process..

[2]  Fabrice Plante,et al.  A pitch extraction reference database , 1995, EUROSPEECH.

[3]  G. Wahba Spline models for observational data , 1990 .

[4]  Q. Summerfield Book Review: Auditory Scene Analysis: The Perceptual Organization of Sound , 1992 .

[5]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[6]  Shlomo Dubnov,et al.  Maximum a-posteriori probability pitch tracking in noisy environments using harmonic model , 2004, IEEE Transactions on Speech and Audio Processing.

[7]  Xiao Li,et al.  Graphical model approach to pitch tracking , 2004, INTERSPEECH.

[8]  Michael I. Jordan,et al.  Discriminative training of hidden Markov models for multiple pitch tracking [speech processing examples] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[9]  Thomas F. Quatieri,et al.  Pitch estimation and voicing detection based on a sinusoidal speech model , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  Guy J. Brown,et al.  A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 2001 .

[12]  Simon J. Godsill,et al.  Polyphonic pitch tracking using joint Bayesian estimation of multiple frame parameters , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[13]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.