Inversion of F/sub 0/ model for natural-sounding speech synthesis

Natural-sounding speech synthesizers require information from a model quantitatively describing prosody. H. Fujisaki's model (see "Dynamic Characteristics of Voice Fundamental Frequency in Speech and Singing", The Production of Speech, Springer-Verlag, p.39-47, 1983) has shown considerable accuracy on many languages (Fujisaki et al., IEEE Int. Conf. on Acoustics, Speech and Sig. Processing, vol.2, p.211-14, 1993; Fujisaki and Ohno, S., Fourth Int. Conf. on Sig. Processing, vol.1, p.714-17,1998). We propose a method for the estimation of Fujisaki's model parameters, i.e., inversion methods, based on the relative extremes of the pitch contour and a gradient algorithm refinement procedure. Preliminary results show excellent performance of the proposed method in matching the pitch contours. Preliminary results of synthesis making use of the obtained features are very encouraging.

[1]  Eyal Yair,et al.  Super resolution pitch determination of speech signals , 1991, IEEE Trans. Signal Process..

[2]  Hiroshi Murata,et al.  Analysis and modeling of word accent and sentence intonation in Swedish , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Federico Albano Leoni Tre progetti per l'italiano parlato , 2003 .

[4]  Juan Manuel Montero-Martínez,et al.  New rule-based and data-driven strategy to incorporate Fujisaki's F/sub 0/ model to a text-to-speech system in Castillian Spanish , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[6]  H. Fujisaki,et al.  The use of a generative model of F/sub 0/ contours for multilingual speech synthesis , 1998, ICSP '98. 1998 Fourth International Conference on Signal Processing (Cat. No.98TH8344).

[7]  Hiroya Fujisaki,et al.  Dynamic Characteristics of Voice Fundamental Frequency in Speech and Singing , 1983 .

[8]  Keikichi Hirose,et al.  Detection of phrase boundaries in Japanese by low-pass filtering of fundamental frequency contours , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[9]  Sumio Ohno,et al.  A method for automatic extraction of parameters of the fundamental frequency contour , 2000, INTERSPEECH.

[10]  W. Cooper,et al.  Fundamental frequency contours at syntactic boundaries. , 1977, The Journal of the Acoustical Society of America.

[11]  Hansjörg Mixdorff,et al.  A novel approach to the fully automatic extraction of Fujisaki model parameters , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).