Phoneme Discrimination Using Connectionist Networks

The application of connectionist networks to speech recognition is assessed using a set of eight representative phonetic discrimination problems chose with respect to a theory of phonetics. A connectionist network model called the temporal flow model (TFM) is defined which represents temporal relationships using delay links and permits general patterns of connectivity. It is argued that the model has properties appropriate for time varying signals such as speech. Networks are trained using gradient descent methods of iterative nonlinear optimization to reduce the mean-squared error between the actual and the desired response of the output units. Separate network solutions are demonstrated for all eight phonetic discrimination problems for one male speaker. The network solutions are analyzed carefully and are shown in every case to make use of known acoustic phonetic cues. The network solutions vary in the degree to which they make use of context-dependent cues to achieve phoneme recognition. The network solutions were tested on data not used for training and achieved an average accuracy of 99.5%. It is concluded that acoustic phonetic speech recognition can be accomplished using connectionist networks.

[1]  D Zipser,et al.  Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.

[2]  P. Denes Effect of Duration on the Perception of Voicing , 1955 .

[3]  Lokendra Shastri,et al.  Speech recognition using connectionist networks , 1988 .

[4]  L. Lisker Closure Duration and the Intervocalic Voiced-Voiceless Distinction in English , 1957 .

[5]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .

[6]  Raymond L. Watrous,et al.  Connected recognition with a recurrent network , 1990, Speech Commun..

[7]  L. Lisker “Voicing” in English: A Catalogue of Acoustic Features Signaling /b/ Versus /p/ in Trochees , 1986, Language and speech.

[8]  Raymond L. Watrous,et al.  Complete gradient optimization of a recurrent network applied to /b/,/d/,/g/ discrimination , 1988 .

[9]  Geoffrey E. Hinton Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[10]  Raymond L. Watrous Phoneme Discrimination Using Connectionist Networks , 1990, Machine Learning: From Theory to Applications.

[11]  Raymond L. Watrous Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization , 1988 .

[12]  S. Grossberg The Adaptive Self-Organization of Serial Order in Behavior: Speech, Language, And Motor Control , 1987 .

[13]  Raymond L. Watrous Speaker normalization using second‐order connectionist networks , 1990 .

[14]  Raymond L. Watrous Context‐modulated discrimination of similar vowels using second‐order connectionist networks , 1989 .

[15]  Stephen Grossberg,et al.  CHAPTER 6 – The Adaptive Self-organization of Serial Order in Behavior: Speech, Language, and Motor Control* , 1986 .

[16]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[17]  Kevin J. Lang A time delay neural network architecture for speech recognition , 1989 .

[18]  Dean H. Obrecht Three Experiments in the Perception of Geminate Consonants in Arabic , 1965 .

[19]  G. E. Peterson,et al.  A physiological theory of phonetics. , 1966, Journal of speech and hearing research.

[20]  Raymond L. Watrous Context-modulated vowel discrimination using connectionist networks☆ , 1991 .

[21]  J J Hopfield,et al.  Neural computation by concentrating information in time. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[23]  Alex Waibel,et al.  Phoneme recognition: neural networks vs. hidden Markov models vs. hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[24]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[25]  Thomas Brooks Martin,et al.  Acoustic recognition of a limited vocabulary in continuous speech , 1970 .

[26]  P. Mueller,et al.  General Principles of Operations in Neuron Nets with Application to Acoustical Pattern Recognition , 1962 .