Glove-TalkII: Mapping Hand Gestures to Speech Using Neural Networks

Glove-TalkII is a system which translates hand gestures to speech through an adaptive interface. Hand gestures are mapped continuously to 10 control parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-TalkII uses several input devices (including a CyberGlove, a ContactGlove, a 3- space tracker, and a foot-pedal), a parallel formant speech synthesizer and 3 neural networks. The gesture-to-speech task is divided into vowel and consonant production by using a gating network to weight the outputs of a vowel and a consonant neural network. The gating network and the consonant network are trained with examples from the user. The vowel network implements a fixed, user-defined relationship between hand-position and vowel sound and does not require any training examples from the user. Volume, fundamental frequency and stop consonants are produced with a fixed mapping from the input devices. One subject has trained to speak intelligibly with Glove-TalkII. He speaks slowly with speech quality similar to a text-to-speech synthesizer but with far more natural-sounding pitch variations.

[1]  Godfrey Dewey,et al.  Relativ frequency of English speech sounds , 1923 .

[2]  Homer Dudley,et al.  A Synthetic Speaker , 1939, Science.

[3]  H. Brekle,et al.  Mechanismus der menschlichen Sprache nebst Beschreibung einer sprechenden Maschine , 1970 .

[4]  J L Flanagan,et al.  Voices of men and machines. , 1972, The Journal of the Acoustical Society of America.

[5]  P. Ladefoged A course in phonetics , 1975 .

[6]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[7]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[8]  J. Solomon Speech synthesis techniques , 1981, 1981 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[9]  Geoff Bristow,et al.  Electronic Speech Synthesis , 1984 .

[10]  D. Rumelhart Learning internal representations by back-propagating errors , 1986 .

[11]  D. N. Chin User modeling in UC, the UNIX consultant , 1986, CHI '86.

[12]  Allen Gersho,et al.  The Boltzmann Perceptron Network: A Multi-Layered Feed-Forward Network Equivalent to the Boltzmann Machine , 1988, NIPS.

[13]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[14]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[15]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[16]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[17]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[18]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[19]  S. Qian,et al.  Nonlinear adaptive networks: A little theory, a few applications , 1990 .

[20]  Geoffrey E. Hinton,et al.  Building adaptive interfaces with neural networks: The glove-talk pilot study , 1990, INTERACT.

[21]  Randy F. Pausch,et al.  Tailor: creating custom user interfaces based on gesture , 1990, UIST '90.

[22]  A. D. Girson,et al.  Articulator-based synthesis for conversational speech , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[23]  Geoffrey E. Hinton,et al.  Glove-Talk: a neural network interface between a data-glove and a speech synthesizer , 1993, IEEE Trans. Neural Networks.

[24]  R. Zemel A minimum description length framework for unsupervised learning , 1994 .

[25]  Markus A. Thies Adaptive User Interfaces , 1994, IFIP Congress.

[26]  Michael I. Jordan Motor Learning and the Degrees of Freedom Problem , 2018, Attention and Performance XIII.