论文信息 - Glove-TalkII: an adaptive gesture-to-formant interface

Glove-TalkII: an adaptive gesture-to-formant interface

Glove-TaikII is a system which translates hand gestures-· to speech through an adaptive interface. Hand gestures are mapped continuously to 10 control parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary, multiple languages in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-TaikII uses several input devices (including a Cyberglove, a ContactGlove, a polhemus sensor, and a foot-pedal), a parallel formant speech synthesizer and 3 neural networks. The gestureto-speech task is divided into vowel and consonant production by using a gating network to weight the outputs of a vowel and a consonant neural network. The gating network and the consonant network are trained with examples from the user. The vowel network implements a fixed, user-defined relationship between hand-position and vowel sound and does not require any training examples from the user. Volume, fundamental frequency and stop consonants are produced with a fixed mapping from the input devices. One subject has trained for about 100 hours to speak intelligibly with Glove-TalkII. He passed through eight distinct stages while learning to speak. He speaks slowly with speech quality similar to a text-to-speech synthesizer but with far more natural-sounding pitch variations.

Geoffrey E. Hinton | Sidney S. Fels | S. Fels

[1] Homer Dudley,et al. A Synthetic Speaker , 1939, Science.

[2] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[3] P. Ladefoged. A course in phonetics , 1975 .

[4] D. Broomhead,et al. Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[5] Godfrey Dewey,et al. Relativ frequency of English speech sounds , 1923 .

[6] David S. Broomhead,et al. Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[7] Geoffrey E. Hinton,et al. Glove-TalkII: Mapping Hand Gestures to Speech Using Neural Networks , 1994, NIPS.

[8] H. Brekle,et al. Mechanismus der menschlichen Sprache nebst Beschreibung einer sprechenden Maschine , 1970 .

[9] Geoffrey E. Hinton,et al. Glove-Talk: a neural network interface between a data-glove and a speech synthesizer , 1993, IEEE Trans. Neural Networks.