暂无分享,去创建一个
Misha Denil | Nando de Freitas | Miaosen Wang | Yu Zhang | Yutian Chen | Yusuf Aytar | Yi Yang | Brendan Shillingford | Yannis Assael | Eren Sezener | Wendi Liu | Luis C. Cobo | N. D. Freitas | Misha Denil | Yutian Chen | Eren Sezener | Y. Aytar | Yannis Assael | Brendan Shillingford | Yi Yang | Yu Zhang | Miaosen Wang | Wendi Liu
[1] Adam Finkelstein,et al. Text-based editing of talking-head video , 2019, ACM Trans. Graph..
[2] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[3] Justus Thies,et al. Neural Voice Puppetry: Audio-driven Facial Reenactment , 2019, ECCV.
[4] Hans-Peter Seidel,et al. Neural style-preserving visual dubbing , 2019, ACM Trans. Graph..
[5] Ira Kemelmacher-Shlizerman,et al. Synthesizing Obama , 2017, ACM Trans. Graph..
[6] Hang Zhou,et al. Talking Face Generation by Adversarially Disentangled Audio-Visual Representation , 2018, AAAI.
[7] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.
[8] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[9] Hank Liao,et al. Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[10] Fred L. Drake,et al. Python 3 Reference Manual , 2009 .
[11] Jeff Donahue,et al. Efficient Video Generation on Complex Datasets , 2019, ArXiv.
[12] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[13] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[14] Hao Zhu,et al. High-Resolution Talking Face Generation via Mutual Information Approximation , 2018, ArXiv.
[15] Joon Son Chung,et al. Voxceleb: Large-scale speaker verification in the wild , 2020, Comput. Speech Lang..
[16] Arkadiusz Stopczynski,et al. Ava Active Speaker: An Audio-Visual Dataset for Active Speaker Detection , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Naomi Harte,et al. TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech , 2015, IEEE Transactions on Multimedia.
[18] Yoshua Bengio,et al. ObamaNet: Photo-realistic lip-sync from text , 2017, ArXiv.
[19] Maja Pantic,et al. Realistic Speech-Driven Facial Animation with GANs , 2019, International Journal of Computer Vision.
[20] Andrew Zisserman,et al. X2Face: A network for controlling face generation by using images, audio, and pose codes , 2018, ECCV.
[21] Joon Son Chung,et al. You said that? , 2017, BMVC.
[22] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[23] Chenliang Xu,et al. Lip Movements Generation at a Glance , 2018, ECCV.
[24] Jordi Torres,et al. Wav2Pix: Speech-conditioned Face Generation Using Generative Adversarial Networks , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Satoshi Nakamura,et al. Lip movement synthesis from speech based on hidden Markov models , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.
[26] Jaakko Lehtinen,et al. Audio-driven facial animation by joint end-to-end learning of pose and emotion , 2017, ACM Trans. Graph..
[27] Bartholomäus Wissmath,et al. Effects on Spatial Presence, Transportation, Flow, and Enjoyment , 2009 .
[28] Hujun Bao,et al. Audio-driven Talking Face Video Generation with Natural Head Pose , 2020, ArXiv.
[29] Chen Change Loy,et al. Everybody’s Talkin’: Let Me Talk as You Want , 2020, IEEE Transactions on Information Forensics and Security.
[30] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.
[31] Jae Hyun Lim,et al. Geometric GAN , 2017, ArXiv.
[32] Gaël Varoquaux,et al. The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.
[33] Heiga Zen,et al. Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning , 2019, INTERSPEECH.
[34] Thomas Paine,et al. Large-Scale Visual Speech Recognition , 2018, INTERSPEECH.
[35] Jesús Chamorro-Martínez,et al. Diatom autofocusing in brightfield microscopy: a comparative study , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.
[36] Chenliang Xu,et al. Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Cees M. Koolstra,et al. The Pros and Cons of Dubbing and Subtitling , 2002 .
[38] Victor Lempitsky,et al. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[39] C. V. Jawahar,et al. Cross-language Speech Dependent Lip-synchronization , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[41] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.
[42] Jan Kautz,et al. Loss Functions for Image Restoration With Neural Networks , 2017, IEEE Transactions on Computational Imaging.
[43] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[44] Jingwen Zhu,et al. Talking Face Generation by Conditional Recurrent Adversarial Network , 2018, IJCAI.
[45] Heiga Zen,et al. Sample Efficient Adaptive Text-to-Speech , 2018, ICLR.
[46] Joon Son Chung,et al. You Said That?: Synthesising Talking Faces from Audio , 2019, International Journal of Computer Vision.