Review of Neural Networks for Speech Recognition

The performance of current speech recognition systems is far below that of humans. Neural nets offer the potential of providing massive parallelism, adaptation, and new algorithmic approaches to problems in speech recognition. Initial studies have demonstrated that multilayer networks with time delays can provide excellent discrimination between small sets of pre-segmented difficult-to-discriminate words, consonants, and vowels. Performance for these small vocabularies has often exceeded that of more conventional approaches. Physiological front ends have provided improved recognition accuracy in noise and a cochlea filter-bank that could be used in these front ends has been implemented using micro-power analog VLSI techniques. Techniques have been developed to scale networks up in size to handle larger vocabularies, to reduce training time, and to train nets with recurrent connections. Multilayer perceptron classifiers are being integrated into conventional continuous-speech recognizers. Neural net architectures have been developed to perform the computations required by vector quantizers, static pattern classifiers, and the Viterbi decoding algorithm. Further work is necessary for large-vocabulary continuous-speech problems, to develop training algorithms that progressively build internal word models, and to develop compact VLSI neural net hardware.

[1]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[2]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  W. Little The existence of persistent states in the brain , 1974 .

[5]  Thomas B. Martin,et al.  Automatic Speech and Speaker Recognition , 1979 .

[6]  G. R. Doddington,et al.  Computers: Speech recognition: Turning theory to practice: New ICs have brought the requisite computer power to speech technology; an evaluation of equipment shows where it stands today , 1981, IEEE Spectrum.

[7]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[8]  John E. Shore,et al.  Discrete utterance speech recognition without time alignment , 1983, IEEE Trans. Inf. Theory.

[9]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[10]  D. Burton,et al.  Isolated-word speech recognition using multisection vector quantization codebooks , 1984, IEEE Trans. Acoust. Speech Signal Process..

[11]  E. Nordeen,et al.  Androgens prevent normally occurring cell death in a sexually dimorphic spinal nucleus. , 1985, Science.

[12]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[13]  Thomas W. Parsons,et al.  Voice and Speech Processing , 1986 .

[14]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[15]  Kanter,et al.  Temporal association in asymmetric neural networks. , 1986, Physical review letters.

[16]  C. L. Giles,et al.  Machine learning using higher order correlation networks , 1986 .

[17]  D Kleinfeld,et al.  Sequential state generation by model neural networks. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[18]  T. D. Harrison,et al.  Boltzmann machines for speech recognition , 1986 .

[19]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[20]  Jeffrey L. Elman,et al.  Interactive processes in speech perception: the TRACE model , 1986 .

[21]  Andy Hon Wai Chun,et al.  Toward a massively parallel system for word recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  David S. Pallett A PCM/VCR speech database exchange format , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Lalit R. Bahl,et al.  Experiments with the Tangora 20,000 word speech recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  John J. Hopfield,et al.  CONCENTRATION INFORMATION IN TIME: ANALOG NEURAL NETWORKS WITH APPLICATIONS TO SPEECH RECOGNITION PROBLEMS. , 1987 .

[25]  Tad Hogg,et al.  A Dynamical Approach to Temporal Pattern Processing , 1987, NIPS.

[26]  Robert J. Marks,et al.  An Artificial Neural Network for Spatio-Temporal Bipolar Patterns: Application to Phoneme Classification , 1987, NIPS.

[27]  Robert M. Farber,et al.  How Neural Nets Work , 1987, NIPS.

[28]  B. Gold,et al.  A Comparison of Hamming and Hopfield Neural Nets for Pattern Classification , 1987 .

[29]  D. Mackay The Organization of Perception and Action , 1987 .

[30]  W. Marslen-Wilson Functional parallelism in spoken word-recognition , 1987, Cognition.

[31]  Pineda,et al.  Generalization of back-propagation to recurrent neural networks. , 1987, Physical review letters.

[32]  S Dehaene,et al.  Neural networks that learn temporal sequences by selection. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Richard W. Prager,et al.  A comparison of the Boltzmann machine and the back propagation network as recognizers of static speech patterns , 1987 .

[34]  R. Hecht-Nielsen Nearest matched filter classification of spatiotemporal patterns. , 1987, Applied optics.

[35]  Roger K. Moore,et al.  Experiments in Isolated Digit Recognition Using the Multi-Layer Perceptron, , 1987 .

[36]  David J. Burr Speech Recognition Experiments with Perceptrons , 1987, NIPS.

[37]  C. D. Geisler,et al.  A composite auditory model for processing speech sounds. , 1987, The Journal of the Acoustical Society of America.

[38]  S Grossberg,et al.  Masking fields: a massively parallel neural architecture for learning, recognizing, and predicting multiple groupings of patterned data. , 1987, Applied optics.

[39]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[40]  Joachim M. Buhmann,et al.  Noise-driven temporal association in neural networks , 1987 .

[41]  D. B. Paul A speaker-stress resistant HMM isolated word recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[42]  Richard Lippmann,et al.  Neural Net and Traditional Classifiers , 1987, NIPS.

[43]  T. Poggio,et al.  Synapses that compute motion. , 1987, Scientific American.

[44]  Lokendra Shastri,et al.  Learning Phonetic Features Using Connectionist Networks , 1987, IJCAI.

[45]  Stephen M. Omohundro,et al.  Efficient Algorithms with Neural Network Behavior , 1987, Complex Syst..

[46]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[47]  David J. Burr,et al.  Experiments on neural net recognition of spoken and written text , 1988, IEEE Trans. Acoust. Speech Signal Process..

[48]  Richard F. Lyon,et al.  An analog electronic cochlea , 1988, IEEE Trans. Acoust. Speech Signal Process..

[49]  Stephen Grossberg,et al.  Nonlinear neural networks: Principles, mechanisms, and architectures , 1988, Neural Networks.

[50]  James A. Anderson,et al.  A connectionist model for consonant-vowel syllable recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[51]  John Moody,et al.  Speedy alternatives to back propagation , 1988, Neural Networks.

[52]  M. Hunt,et al.  Speaker dependent and independent speech recognition experiments with an auditory model , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[53]  T. Kohonen,et al.  Statistical pattern recognition with neural networks: benchmarking studies , 1988, IEEE 1988 International Conference on Neural Networks.

[54]  Teuvo Kohonen,et al.  An introduction to neural computing , 1988, Neural Networks.

[55]  Victor W. Zue,et al.  Some phonetic recognition experiments using artificial neural nets , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[56]  D J Amit,et al.  Neural networks counting chimes. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[57]  S. Greenberg,et al.  The ear as a speech analyzer , 1988 .

[58]  Raymond L. Watrous Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization , 1988 .

[59]  E. McDermott,et al.  Phoneme recognition using Kohonen's LVQ , 1988 .

[60]  Hsiao-Wuen Hon,et al.  Large-vocabulary speaker-independent continuous speech recognition using HMM , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[61]  Oded Ghitza,et al.  Auditory neural feedback as a basis for speech processing , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[62]  S. Seneff A joint synchrony/mean-rate model of auditory speech processing , 1990 .

[63]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[64]  J. Mann,et al.  A self-organizing neural net chip , 1988, Proceedings of the IEEE 1988 Custom Integrated Circuits Conference.

[65]  T. Irino,et al.  A study on the speaker‐independent feature extractinn of Japanese vowels by neural networks , 1988 .

[66]  Lokendra Shastri,et al.  Speech recognition using connectionist networks , 1988 .

[67]  D Zipser,et al.  Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.

[68]  R.K. Moore,et al.  Improved speech recognition using a reduced auditory representation , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[69]  James L. McClelland,et al.  Learning Subsequential Structure in Simple Recurrent Networks , 1988, NIPS.

[70]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[71]  Yoshua Bengio,et al.  Use of neural networks for the recognition of place of articulation , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[72]  William Y. Huang,et al.  Neural nets for speech recognition , 1988 .

[73]  Shihab A. Shamma,et al.  The acoustic features of speech sounds in a model of auditory processing: vowels and voiceless fricatives , 1988 .

[74]  R. Lippmann Pattern classification using neural networks , 1989, IEEE Communications Magazine.

[75]  Kiyohiro Shikano,et al.  Modularity and scaling in large phonemic neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[76]  Gerald Tesauro,et al.  Neurogammon Wins Computer Olympiad , 1989, Neural Computation.

[77]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[78]  Hervé Bourlard,et al.  Speech pattern discrimination and multilayer perceptrons , 1989 .

[79]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[80]  David H. Sharp,et al.  Neural nets and artificial intelligence , 1989 .

[81]  J. I. Raffel,et al.  A generic architecture for wafer-scale neuromorphic systems , 1990 .

[82]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[83]  B.P. Yuhas,et al.  Integration of acoustic and visual speech signals using neural networks , 1989, IEEE Communications Magazine.

[84]  Bernhard R. Kämmerer,et al.  Experiments for isolated-word recognition with single- and two-layer perceptrons , 1990, Neural Networks.

[85]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[86]  R. Linggard,et al.  Neural arrays for speech recognition , 1990 .

[87]  L. B. Almeida A learning rule for asynchronous perceptrons with feedback in a combinatorial environment , 1990 .

[88]  Mahesan Niranjan,et al.  Neural networks and radial basis functions in classifying static speech patterns , 1990 .

[89]  Bauer,et al.  Nonlinear dynamics of feedback multilayer perceptrons. , 1990, Physical review. A, Atomic, molecular, and optical physics.

[90]  Mark A. Kramer,et al.  Diagnosis using backpropagation neural networks—analysis and criticism , 1990 .

[91]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..