论文信息 - Audiovisual Speech Synthesis

Audiovisual Speech Synthesis

This paper presents the main approaches used to synthesize talking faces, and provides greater detail on a handful of these approaches. An attempt is made to distinguish between facial synthesis itself (i.e. the manner in which facial movements are rendered on a computer screen), and the way these movements may be controlled and predicted using phonetic input. The two main synthesis techniques (model-based vs. image-based) are contrasted and presented by a brief description of the most illustrative existing systems. The challenging issues—evaluation, data acquisition and modeling—that may drive future models are also discussed and illustrated by our current work at ICP.

[1] Hans Peter Graf,et al. Sample-based synthesis of photo-realistic talking heads , 1998, Proceedings Computer Animation '98 (Cat. No.98EX169).

[2] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[3] Steven M. Seitz,et al. View morphing , 1996, SIGGRAPH.

[4] Timothy F. Cootes,et al. Extraction of Visual Features for Lipreading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5] Y Payan,et al. The mesh-matching algorithm: an automatic 3D mesh generator for finite element structures. , 2000, Journal of biomechanics.

[6] D. Ostry,et al. The equilibrium point hypothesis and its application to speech motor control. , 1996, Journal of speech and hearing research.

[7] Gérard Bailly,et al. Creating and controlling video-realistic talking heads , 2001, AVSP.

[8] Frederic I. Parke. A model for human faces that allows speech synchronized animation , 1975, Comput. Graph..

[9] Jörn Ostermann,et al. User evaluation: Synthetic talking faces for interactive services , 1999, The Visual Computer.

[10] Keiichi Tokuda,et al. Text-to-visual speech synthesis based on parameter generation from HMM , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[11] Keith Waters,et al. Computer facial animation , 1996 .

[12] D Terzopoulos,et al. The computer synthesis of expressive faces. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[13] Yohan Payan,et al. A 3D Finite Element Model of the Face for Simulation in Plastic and Maxillo-Facial Surgery , 2000, MICCAI.

[14] Gérard Bailly,et al. MOTHER: a new generation of talking heads providing a flexible articulatory control for video-realistic speech animation , 2000, INTERSPEECH.

[15] Björn Granström,et al. The teleface project multi-modal speech-communication for the hearing impaired , 1997, EUROSPEECH.

[16] Parke,et al. Parameterized Models for Facial Animation , 1982, IEEE Computer Graphics and Applications.

[17] C. Benoît,et al. A set of French visemes for visual speech synthesis , 1994 .

[18] A. Murat Tekalp,et al. Face and 2-D mesh animation in MPEG-4 , 2000, Signal Process. Image Commun..

[19] Shigeo Morishima,et al. Facial image reconstruction by estimated muscle parameter , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[20] Peter Eisert,et al. Analyzing Facial Expressions for Virtual Conferencing , 1998, IEEE Computer Graphics and Applications.

[21] S Shaiman,et al. Different phase-stable relationships of the upper lip and jaw for production of vowels and diphthongs. , 1991, The Journal of the Acoustical Society of America.

[22] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.

[23] S. Ohman. Numerical model of coarticulation. , 1967, The Journal of the Acoustical Society of America.

[24] Dominic W. Massaro,et al. Illusions and Issues In Bimodal Speech Perception , 1998, AVSP.

[25] Jonas Beskow,et al. Rule-based visual speech synthesis , 1995, EUROSPEECH.

[26] Gérard Bailly,et al. Learning to speak. Sensori-motor control of speech movements , 1997, Speech Commun..

[27] M. Turk,et al. Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[28] Aggelos K. Katsaggelos,et al. Model-based synthetic view generation from a monocular video sequence , 1997, Proceedings of International Conference on Image Processing.

[29] Gérard Bailly,et al. TOWARDS AN AUDIOVISUAL VIRTUAL TALKING HEAD: 3D ARTICULATORY MODELING OF TONGUE, LIPS AND FACE BASED ON MRI AND VIDEO IMAGES , 1998 .

[30] Fabio Lavagetto,et al. MPEG-4:Audio/Video and Synthetic Graphics/Audio for Real-Time , 1997 .

[31] Louis Goldstein,et al. Gestural specification using dynamically-defined articulatory structures , 1990 .

[32] Gavin C. Cawley,et al. Visual speech synthesis using statistical models of shape and appearance , 2001, AVSP.

[33] Timothy F. Cootes,et al. Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[34] Guillaume Gibert,et al. Evaluation of movement generation systems using the point-light technique , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[35] Tony Ezzat,et al. MikeTalk: a talking facial display based on morphing visemes , 1998, Proceedings Computer Animation '98 (Cat. No.98EX169).

[36] T. Kaburagi,et al. Articulatory movement formation by kinematic triphone model , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[37] Tomaso Poggio,et al. Trainable Videorealistic Speech Animation , 2004, FGR.

[38] Bertil Lyberg,et al. Visual Speech Synthesis With Concatenative Speech , 1998, AVSP.

[39] Yohan Payan,et al. A 3 D Finite Element Model of the Face for Simulation in Plastic and MaxilloFacial Surgery , 2001 .

[40] N. Michael Brooke,et al. Two- and Three-Dimensional Audio-Visual Speech Synthesis , 1998, AVSP.

[41] P. Ekman. Unmasking The Face , 1975 .

[42] Satoshi Nakamura,et al. Lip movement synthesis from speech based on hidden Markov models , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[43] Norman I. Badler,et al. Animating facial expressions , 1981, SIGGRAPH '81.

[44] David B. Pisoni,et al. Perception of Synthetic Speech , 1997 .

[45] Andrew P. Breen,et al. Modeling visual coarticulation in synthetic talking heads using a lip motion unit inventory with concatenative synthesis , 2000, INTERSPEECH.

[46] Mikko Sams,et al. Audio-visual speech synthesis for finnish , 1999, AVSP.

[47] Demetri Terzopoulos,et al. Physically-based facial modelling, analysis, and animation , 1990, Comput. Animat. Virtual Worlds.

[48] Christoph Bregler,et al. Video rewrite: visual speech synthesis from video , 1997, AVSP.