Speaker-Independent Digit Recognition Using a Neural Network with Time-Delayed Connections

The capability of a small neural network to perform speaker-independent recognition of spoken digits in connected speech has been investigated. The network uses time delays to organize rapidly changing outputs of symbol detectors over the time scale of a word. The network is data driven and unclocked. To achieve useful accuracy in a speaker-independent setting, many new ideas and procedures were developed. These include improving the feature detectors, self-recognition of word ends, reduction in network size, and dividing speakers into natural classes. Quantitative experiments based on Texas Instruments (TI) digit databases are described.

[1]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[2]  John J. Hopfield,et al.  Connected-digit speaker-dependent speech recognition using a neural network with time-delayed connections , 1991, IEEE Trans. Signal Process..

[3]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[4]  D. Z. Anderson,et al.  Photorefractive delay line for the visualization and processing of time-dependent signals. , 1993, Optics letters.

[5]  J J Hopfield,et al.  Learning algorithms and probability distributions in feed-forward and feed-back networks. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[6]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[7]  M. Bush,et al.  Network-based connected digit recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[8]  Michael A. Arbib,et al.  Timing and chunking in processing temporal order , 1993, IEEE Trans. Syst. Man Cybern..

[9]  Frank K. Soong,et al.  High performance connected digit recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[10]  John J. Hopfield,et al.  CONCENTRATION INFORMATION IN TIME: ANALOG NEURAL NETWORKS WITH APPLICATIONS TO SPEECH RECOGNITION PROBLEMS. , 1987 .