Connected-digit speaker-dependent speech recognition using a neural network with time-delayed connections

An analog neural network that can be taught to recognize stimulus sequences is used to recognize the digits in connected speech. The circuit computes in the analog domain, using linear circuits for signal filtering and nonlinear circuits for simple decisions, feature extraction, and noise suppression. An analog perceptron learning rule is used to organize the subset of connections used in the circuit that are specific to the chosen vocabulary. Computer simulations of the learning algorithm and circuit demonstrate recognition scores >99 % for a single-speaker connected-digit data base. There is no clock. The circuit is data driven, and there is no necessity for endpoint detection or segmentation of the speech signal during recognition. Training in the presence of noise provides noise immunity up to the trained level. For the speech problem studied, the circuit connections need only be accurate to about 3-b digitization depth for optimum performance. The algorithm used maps efficiently onto analog neutral network hardware. >

[1]  Carol Y. Espy-Wilson,et al.  An acoustic-phonetic approach to speech recognition : application to the semivowels , 1987 .

[2]  M. Konishi,et al.  Axonal delay lines for time measurement in the owl's brainstem. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[4]  T. Martin,et al.  On the effects of varying filter bank parameters on isolated word recognition , 1982 .

[5]  P. Mueller,et al.  General Principles of Operations in Neuron Nets with Application to Acoustical Pattern Recognition , 1962 .

[6]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[7]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[8]  S. W. Kuffler Discharge patterns and functional organization of mammalian retina. , 1953, Journal of neurophysiology.

[9]  J J Hopfield,et al.  Learning algorithms and probability distributions in feed-forward and feed-back networks. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[10]  J. Hopfield,et al.  Computing with neural circuits: a model. , 1986, Science.

[11]  John Lazzaro,et al.  A machine for neural computation of acoustical patterns with application to real time speech recognition , 1987 .

[12]  C. Myers,et al.  A level building dynamic time warping algorithm for connected word recognition , 1981 .

[13]  James A. Kaltenbach,et al.  Spectral and temporal response patterns of single units in the chinchilla dorsal cochlear nucleus , 1987, Experimental Neurology.

[14]  Richard F. Lyon,et al.  An analog electronic cochlea , 1988, IEEE Trans. Acoust. Speech Signal Process..

[15]  J J Hopfield,et al.  Neural computation by concentrating information in time. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[16]  David J. Burr Speech Recognition Experiments with Perceptrons , 1987, NIPS.

[17]  T. D. Harrison,et al.  Boltzmann machines for speech recognition , 1986 .

[18]  Geoffrey E. Hinton,et al.  Learning sets of filters using back-propagation , 1987 .

[19]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[20]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[21]  John J. Hopfield,et al.  CONCENTRATION INFORMATION IN TIME: ANALOG NEURAL NETWORKS WITH APPLICATIONS TO SPEECH RECOGNITION PROBLEMS. , 1987 .

[22]  M. Bush,et al.  Network-based connected digit recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[23]  G. W. Hughes,et al.  Minimum Prediction Residual Principle Applied to Speech Recognition , 1975 .

[24]  D Zipser,et al.  Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.

[25]  Teuvo Kohonen,et al.  The 'neural' phonetic typewriter , 1988, Computer.

[26]  Bernard Gold Hopfield model applied to vowel and consonant discrimination , 1986 .

[27]  S. Shamma Speech processing in the auditory system. II: Lateral inhibition and the central processing of speech evoked activity in the auditory nerve. , 1985, The Journal of the Acoustical Society of America.