暂无分享,去创建一个
Thomas Paine | Matthew W. Hoffman | Nando de Freitas | Hank Liao | Ben Laurie | Andrew W. Senior | Hasim Sak | Utsav Prabhu | Brendan Shillingford | Kanishka Rao | Cían Hughes | Yannis M. Assael | Ben Coppin | Lorrayne Bennett | Marie Mulville | T. Paine | A. Senior | N. D. Freitas | Kanishka Rao | H. Sak | H. Liao | Yannis Assael | Brendan Shillingford | Ben Laurie | Cían Hughes | Utsav Prabhu | Lorrayne Bennett | Marie Mulville | Ben Coppin
[1] Alex Pentland,et al. Automatic lipreading by optical-flow analysis , 1989 .
[2] Alan Jeffrey Goldschen,et al. Continuous automatic speech recognition by lipreading , 1993 .
[3] John C. Wells,et al. Computer-coding the IPA: a proposed extension of SAMPA , 1995 .
[4] Javier R. Movellan,et al. Dynamic Features for Visual Speechreading: A Systematic Comparison , 1996, NIPS.
[5] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[6] Gerasimos Potamianos,et al. Speaker independent audio-visual database for bimodal ASR , 1997, AVSP.
[7] Gerasimos Potamianos,et al. An image transform approach for HMM based automatic lipreading , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).
[8] Gerasimos Potamianos,et al. Discriminative training of HMM stream exponents for audio-visual speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[9] Thomas S. Huang,et al. Bimodal speech recognition using coupled hidden Markov models , 2000, INTERSPEECH.
[10] Jesús Chamorro-Martínez,et al. Diatom autofocusing in brightfield microscopy: a comparative study , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.
[11] Fernando Pereira,et al. Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..
[12] Timothy F. Cootes,et al. Extraction of Visual Features for Lipreading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..
[13] Sadaoki Furui,et al. Audio-visual speech recognition using lip movement extracted from side-face images , 2003, AVSP.
[14] Gabriel Fernandez,et al. Video Shot Boundary Detection Based on Color Histogram , 2003, TREC Video Retrieval Evaluation.
[15] Sadaoki Furui,et al. Multi-Modal Speech Recognition Using Optical-Flow Analysis for Lip Images , 2004, J. VLSI Signal Process..
[16] Juergen Luettin,et al. Audio-Visual Automatic Speech Recognition: An Overview , 2004 .
[17] Yoni Bauduin,et al. Audio-Visual Speech Recognition , 2004 .
[18] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[19] Petros Maragos,et al. Adaptive multimodal fusion by uncertainty compensation , 2006, INTERSPEECH.
[20] Sridha Sridharan,et al. Patch-Based Representation of Visual Speech , 2006 .
[21] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[22] Petros Maragos,et al. Multimodal Fusion and Learning with Uncertain Features Applied to Audiovisual Speech Recognition , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.
[23] Alan Wee-Chung Liew,et al. An Automatic Lipreading System for Spoken Digits With Limited Training Data , 2008, IEEE Transactions on Circuits and Systems for Video Technology.
[24] Jean-Philippe Thiran,et al. Information Theoretic Feature Extraction for Audio-Visual Speech Recognition , 2009, IEEE Transactions on Signal Processing.
[25] Matti Pietikäinen,et al. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON MULTIMEDIA 1 Lipreading with Local Spatiotemporal Descriptors , 2022 .
[26] Petros Maragos,et al. Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[27] Takeshi Saitoh,et al. A study of influence of word lip reading by change of frame rate , 2010, AVSP.
[28] Craig Chambers,et al. FlumeJava: easy, efficient data-parallel pipelines , 2010, PLDI '10.
[29] Jayavardhana Gubbi,et al. Lip reading using optical flow and support vector machines , 2010, 2010 3rd International Congress on Image and Signal Processing.
[30] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[31] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .
[32] Hank Liao,et al. Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[33] Ahmad B. A. Hassanat,et al. Visual Speech Recognition , 2011, ArXiv.
[34] A. Kho,et al. Silence in the EHR: infrequent documentation of aphonia in the electronic health record , 2014, BMC Health Services Research.
[35] Tetsuya Ogata,et al. Lipreading using convolutional neural network , 2014, INTERSPEECH.
[36] Matti Pietikäinen,et al. A review of recent advances in visual speech decoding , 2014, Image Vis. Comput..
[37] Barry-John Theobald,et al. The effect of speaking rate on audio and visual speech , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Thomas Brox,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.
[39] Hermann Ney,et al. Deep Learning of Mouth Shapes for Sign Language , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).
[40] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Satoshi Tamura,et al. Integration of deep bottleneck features for audio-visual speech recognition , 2015, INTERSPEECH.
[42] Yajie Miao,et al. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[43] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[44] Mohammed Bennamoun,et al. Listening with Your Eyes: Towards a Practical Visual Speech Recognition System Using Deep Boltzmann Machines , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[45] Tara N. Sainath,et al. Large vocabulary automatic speech recognition for children , 2015, INTERSPEECH.
[46] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[47] Shimon Whiteson,et al. LipNet: End-to-End Sentence-level Lipreading , 2016, 1611.01599.
[48] Robert M. Nickel,et al. Dynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASR , 2016, INTERSPEECH.
[49] Tetsuya Takiguchi,et al. Audio-Visual Speech Recognition Using Bimodal-Trained Bottleneck Features for a Person with Severe Hearing Loss , 2016, INTERSPEECH.
[50] Ian McGraw,et al. Personalized speech recognition on mobile devices , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[51] Abhinav Thanda,et al. Audio Visual Speech Recognition Using Deep Recurrent Neural Networks , 2016, MPRSS.
[52] Maja Pantic,et al. Deep complementary bottleneck features for visual speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[53] Jürgen Schmidhuber,et al. Lipreading with long short-term memory , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[54] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[55] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.
[56] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[57] Brian Roark,et al. Learning N-Gram Language Models from Uncertain Data , 2016, INTERSPEECH.
[58] Richard Harvey,et al. Decoding visemes: Improving machine lip-reading , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[59] Stephen J. Cox,et al. Improved speaker independent lip reading using speaker adaptive training and deep neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[60] Amirsina Torfi,et al. 3D Convolutional Neural Networks for Cross Audio-Visual Matching Recognition , 2017, IEEE Access.
[61] Themos Stafylakis,et al. Combining Residual Networks with LSTMs for Lipreading , 2017, INTERSPEECH.
[62] Richard Harvey,et al. Phoneme-to-viseme mappings: the good, the bad, and the ugly , 2017, Speech Commun..
[63] Shmuel Peleg,et al. Vid2speech: Speech reconstruction from silent video , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[64] Karel Palecek. Utilizing Lipreading in Large Vocabulary Continuous Speech Recognition , 2017, SPECOM.
[65] Tara N. Sainath,et al. A Comparison of Sequence-to-Sequence Models for Speech Recognition , 2017, INTERSPEECH.
[66] Joon Son Chung,et al. Lip Reading in Profile , 2017, BMVC.
[67] Maja Pantic,et al. End-to-End Multi-View Lipreading , 2017, BMVC.
[68] Ben P. Milner,et al. Generating Intelligible Audio Speech From Visual Speech , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[69] Hagen Soltau,et al. Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition , 2016, INTERSPEECH.
[70] Jürgen Schmidhuber,et al. Improving Speaker-Independent Lipreading with Domain-Adversarial Training , 2017, INTERSPEECH.
[71] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[72] Gerasimos Potamianos,et al. Exploring ROI size in deep learning based lipreading , 2017, AVSP.
[73] Rohit Prabhavalkar,et al. Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[74] Shmuel Peleg,et al. Visual Speech Enhancement using Noise-Invariant Training , 2017, ArXiv.
[75] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[76] Themos Stafylakis,et al. Pushing the boundaries of audiovisual word recognition using Residual Networks and LSTMs , 2018, Comput. Vis. Image Underst..
[77] Kaiming He,et al. Group Normalization , 2018, ECCV.
[78] Joon Son Chung,et al. LRS3-TED: a large-scale dataset for visual speech recognition , 2018, ArXiv.
[79] Kai Xu,et al. LCANet: End-to-End Lipreading with Cascaded Attention-CTC , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).
[80] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.
[81] Liangliang Cao,et al. Lip2Audspec: Speech Reconstruction from Silent Lip Movements Video , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[82] Shmuel Peleg,et al. Visual Speech Enhancement , 2017, INTERSPEECH.
[83] Joon Son Chung,et al. Deep Audio-Visual Speech Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.