Domain-Adversarial Training for Session Independent EMG-based Speech Recognition

We present our research on continuous speech recognition based on Surface Electromyography (EMG), where speech information is captured by electrodes attached to the speaker’s face. This method allows speech processing without requiring that an acoustic signal is present; however, reattachment of the EMG electrodes causes subtle changes in the recorded signal, which degrades the recognition accuracy and thus poses a major challenge for practical application of the system. Based on the growing body of recent work in domain-adversarial training of neural networks, we present a system which adapts the neural network frontend of our recognizer to data from a new recording session, without requiring supervised enrollment.

[1]  Ki-Seung Lee,et al.  Prediction of Acoustic Feature Parameters Using Myoelectric Signals , 2010, IEEE Transactions on Biomedical Engineering.

[2]  James T. Heaton,et al.  Silent Speech Recognition as an Alternative Communication Device for Persons With Laryngectomy , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Phil D. Green,et al.  Direct Speech Reconstruction From Articulatory Sensor Data by Machine Learning , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  Victor S. Lempitsky,et al.  Learning Deep Embeddings with Histogram Loss , 2016, NIPS.

[5]  Tanja Schultz,et al.  Tackling Speaking Mode Varieties in EMG-Based Speech Recognition , 2014, IEEE Transactions on Biomedical Engineering.

[6]  Alex Waibel,et al.  Streamlining the front end of a speech recognizer , 2000, INTERSPEECH.

[7]  Trevor Darrell,et al.  Deep Domain Confusion: Maximizing for Domain Invariance , 2014, CVPR 2014.

[8]  George Trigeorgis,et al.  Domain Separation Networks , 2016, NIPS.

[9]  Tanja Schultz,et al.  The EMG-UKA corpus for electromyographic speech processing , 2014, INTERSPEECH.

[10]  Thomas Hueber,et al.  Statistical conversion of silent articulation into audible speech using full-covariance HMM , 2016, Comput. Speech Lang..

[11]  Jürgen Schmidhuber,et al.  Deep Neural Network Frontend for Continuous EMG-Based Speech Recognition , 2016, INTERSPEECH.

[12]  Michael Wand Advancing Electromyographic Continuous Speech Recognition: Signal Preprocessing and Modeling , 2014 .

[13]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  L. Maier-Hein,et al.  Session independent non-audible speech recognition using surface electromyography , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[16]  Jürgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[17]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[18]  Tanja Schultz,et al.  An initial investigation into the real-time conversion of facial surface EMG signals to audible speech , 2016, 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[19]  Tanja Schultz,et al.  Session-independent EMG-based Speech Recognition , 2011, BIOSIGNALS.

[20]  Tanja Schultz,et al.  Towards continuous speech recognition using surface electromyography , 2006, INTERSPEECH.

[21]  Tanja Schultz,et al.  Biosignal-Based Spoken Communication: A Survey , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[22]  Tanja Schultz,et al.  Towards real-life application of EMG-based speech recognition by using unsupervised adaptation , 2014, INTERSPEECH.

[23]  Ngoc Thang Vu,et al.  BioKIT - real-time decoder for biosignal processing , 2014, INTERSPEECH.

[24]  Sebastian Stüker,et al.  Gaussian free cluster tree construction using deep neural network , 2015, INTERSPEECH.

[25]  R.N. Scott,et al.  A new strategy for multifunction myoelectric control , 1993, IEEE Transactions on Biomedical Engineering.

[26]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[27]  Hervé Bourlard,et al.  Connectionist speech recognition , 1993 .

[28]  Tanja Schultz,et al.  Modeling coarticulation in EMG-based continuous speech recognition , 2010, Speech Commun..

[29]  J. M. Gilbert,et al.  Silent speech interfaces , 2010, Speech Commun..

[30]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[31]  Michael Picheny,et al.  Decision trees for phonological rules in continuous speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[32]  Samuel S. Silva,et al.  An Introduction to Silent Speech Interfaces , 2016 .

[33]  Michael C. Mozer,et al.  Learning Deep Disentangled Embeddings with the F-Statistic Loss , 2018, NeurIPS.

[34]  Tanja Schultz,et al.  Synthesizing speech from electromyography using voice transformation techniques , 2009, INTERSPEECH.

[35]  Joon Son Chung,et al.  Lip Reading in the Wild , 2016, ACCV.

[36]  Laurent Girin,et al.  Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces , 2016, PLoS Comput. Biol..