Deep Neural Network Frontend for Continuous EMG-Based Speech Recognition

We report on a Deep Neural Network frontend for a continuous speech recognizer based on Surface Electromyography (EMG). Speech data is obtained by facial electrodes capturing the electric activity generated by the articulatory muscles, thus allowing speech processing without making use of the acoustic signal. The electromyographic signal is preprocessed and fed into the neural network, which is trained on framewise targets; the output layer activations are further processed by a Hidden Markov sequence classifier. We show that such a neural network frontend can be trained on EMG data and yields substantial improvements over previous systems, despite the fact that the available amount of data is very small, just amounting to a few tens of sentences: on the EMG-UKA corpus, we obtain average evaluation set Word Error Rate improvements of more than 32% relative on context-independent phone models and 13% relative on versatile Bundled Phonetic feature (BDPF) models, compared to a conventional system using Gaussian Mixture Models. In particular, on simple context-independent phone models, the new system yields results which are almost as good as with BDPF models, which were specifically designed to cope with small amounts of training data.

[1]  Michael Picheny,et al.  Decision trees for phonological rules in continuous speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Chuck,et al.  Sub Auditory Speech Recognition based on EMG/EPG Signals , 2022 .

[3]  Geoffrey E. Hinton,et al.  Learning a better representation of speech soundwaves using restricted boltzmann machines , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Tanja Schultz,et al.  Towards real-life application of EMG-based speech recognition by using unsupervised adaptation , 2014, INTERSPEECH.

[5]  Ngoc Thang Vu,et al.  BioKIT - real-time decoder for biosignal processing , 2014, INTERSPEECH.

[6]  J. M. Gilbert,et al.  Silent speech interfaces , 2010, Speech Commun..

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  Tanja Schultz,et al.  Pattern learning with deep neural networks in EMG-based speech recognition , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[9]  Tanja Schultz,et al.  Codebook clustering for unit selection based EMG-to-speech conversion , 2015, INTERSPEECH.

[10]  Thomas Hueber,et al.  Statistical conversion of silent articulation into audible speech using full-covariance HMM , 2016, Comput. Speech Lang..

[11]  James T. Heaton,et al.  Signal acquisition and processing techniques for sEMG based silent speech recognition , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[12]  Noboru Sugie,et al.  A Speech Prosthesis Employing a Speech Synthesizer-Vowel Discrimination from Perioral Muscle Activities and Vowel Production , 1985, IEEE Transactions on Biomedical Engineering.

[13]  Michael Schünke,et al.  PROMETHEUS LernAtlas der Anatomie , 2014 .

[14]  Michael Wand Advancing Electromyographic Continuous Speech Recognition: Signal Preprocessing and Modeling , 2014 .

[15]  L. Maier-Hein,et al.  Session independent non-audible speech recognition using surface electromyography , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[16]  António J. S. Teixeira,et al.  Velum Movement Detection based on Surface Electromyography for Speech Interface , 2014, BIOSIGNALS.

[17]  Tanja Schultz,et al.  A Spectral Mapping Method for EMG-based Recognition of Silent Speech , 2010, B-Interface.

[18]  Jürgen Schmidhuber,et al.  An Application of Recurrent Neural Networks to Discriminative Keyword Spotting , 2007, ICANN.

[19]  Jürgen Schmidhuber,et al.  Lipreading with long short-term memory , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Tanja Schultz,et al.  Towards continuous speech recognition using surface electromyography , 2006, INTERSPEECH.

[21]  Tanja Schultz,et al.  The EMG-UKA corpus for electromyographic speech processing , 2014, INTERSPEECH.

[22]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[23]  M S Morse,et al.  Research summary of a scheme to ascertain the availability of speech information in the myoelectric signals of neck and head muscles using surface electrodes. , 1986, Computers in biology and medicine.

[24]  Tanja Schultz,et al.  Synthesizing speech from electromyography using voice transformation techniques , 2009, INTERSPEECH.

[25]  Sebastian Stüker,et al.  Gaussian free cluster tree construction using deep neural network , 2015, INTERSPEECH.

[26]  Hervé Bourlard,et al.  Connectionist speech recognition , 1993 .

[27]  Tanja Schultz,et al.  Array-based Electromyographic Silent Speech Interface , 2013, BIOSIGNALS.

[28]  Ki-Seung Lee,et al.  Prediction of Acoustic Feature Parameters Using Myoelectric Signals , 2010, IEEE Transactions on Biomedical Engineering.

[29]  Tanja Schultz,et al.  Tackling Speaking Mode Varieties in EMG-Based Speech Recognition , 2014, IEEE Transactions on Biomedical Engineering.

[30]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[31]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[32]  Phil D. Green,et al.  A silent speech system based on permanent magnet articulography and direct synthesis , 2016, Comput. Speech Lang..