暂无分享,去创建一个
Shimon Whiteson | Nando de Freitas | Brendan Shillingford | Yannis M. Assael | N. D. Freitas | Yannis Assael | Brendan Shillingford | Shimon Whiteson
[1] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.
[2] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.
[3] Johan A. du Preez,et al. Audio-Visual Speech Recognition using SciPy , 2010 .
[4] Shuicheng Yan,et al. Classification and Feature Extraction by Simplexization , 2008, IEEE Transactions on Information Forensics and Security.
[5] Emmanuel Ferragne,et al. Formant frequencies of vowels in 13 accents of the British Isles , 2010, Journal of the International Phonetic Association.
[6] Stephen J. Cox,et al. Improved speaker independent lip reading using speaker adaptive training and deep neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[8] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[9] Matti Pietikäinen,et al. A review of recent advances in visual speech decoding , 2014, Image Vis. Comput..
[10] Petros Maragos,et al. Adaptive multimodal fusion by uncertainty compensation , 2006, INTERSPEECH.
[11] F. Deland,et al. The story of lip-reading : its genesis and development , 1968 .
[12] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[13] Tetsuya Takiguchi,et al. Audio-Visual Speech Recognition Using Bimodal-Trained Bottleneck Features for a Person with Severe Hearing Loss , 2016, INTERSPEECH.
[14] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[15] Thomas Brox,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.
[16] Timothy F. Cootes,et al. Extraction of Visual Features for Lipreading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..
[17] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[18] Daniel Jurafsky,et al. Lexicon-Free Conversational Speech Recognition with Neural Networks , 2015, NAACL.
[19] Sridha Sridharan,et al. Patch-Based Representation of Visual Speech , 2006 .
[20] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[21] Petros Maragos,et al. Multimodal Fusion and Learning with Uncertain Features Applied to Audiovisual Speech Recognition , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.
[22] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.
[23] Amit Garg amit,et al. Lip reading using CNN and LSTM , 2016 .
[24] Barry-John Theobald,et al. Comparison of human and machine-based lip-reading , 2009, AVSP.
[25] Hermann Ney,et al. Deep Learning of Mouth Shapes for Sign Language , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).
[26] Stefanos Zafeiriou,et al. 300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.
[27] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[28] M. Woodward,et al. Phoneme perception in lipreading. , 1960, Journal of speech and hearing research.
[29] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[30] R. D. Easton,et al. Perceptual dominance during lipreading , 1982, Perception & psychophysics.
[31] Jean-Philippe Thiran,et al. Information Theoretic Feature Extraction for Audio-Visual Speech Recognition , 2009, IEEE Transactions on Signal Processing.
[32] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .
[33] C. G. Fisher,et al. Confusions among visually perceived consonants. , 1968, Journal of speech and hearing research.
[34] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[35] Davis E. King,et al. Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..
[36] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[37] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.
[38] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[39] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[40] Matti Pietikäinen,et al. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON MULTIMEDIA 1 Lipreading with Local Spatiotemporal Descriptors , 2022 .
[41] A. Cruttenden. Gimson's Pronunciation of English , 1994 .
[42] Jürgen Schmidhuber,et al. Lipreading with long short-term memory , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[43] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[44] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[45] G. D. Magoulas,et al. Under review as a conference paper at ICLR 2017 , 2022 .
[46] Tetsuya Ogata,et al. Lipreading using convolutional neural network , 2014, INTERSPEECH.
[47] Petros Maragos,et al. Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.