Historical Perspective of the Field of ASR/NLU

The quest for a machine that can recognize and understand speech, from any speaker, and in any environment has been the holy grail of speech recognition research for more than 70 years. Although we have made great progress in understanding how speech is produced and analyzed, and although we have made enough advances to build and deploy in the field a number of viable speech recognition systems, we still remain far from the ultimate goal of a machine that communicates naturally with any human being. It is the goal of this section to document the history of research in speech recognition and natural language understanding, and to point out areas where great progress has been made, along with the challenges that remain to be solved in the future.

[1]  K. Nagata Spoken digit recognizer for Japanese language. , 1963 .

[2]  M. Moonen,et al.  A sparse block exact affine projection algorithm , 2002, IEEE Trans. Speech Audio Process..

[3]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[4]  V. Zue,et al.  The role of phonological rules in speech understanding research , 1975 .

[5]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[6]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[7]  B. Atal,et al.  Speech analysis and synthesis by linear prediction of the speech wave. , 1971, The Journal of the Acoustical Society of America.

[8]  Harry F. Olson,et al.  Phonetic typewriter , 1957 .

[9]  Dennis H. Klatt,et al.  Review of the ARPA speech understanding project , 1990 .

[10]  Lawrence R. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1975, Bell Syst. Tech. J..

[11]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[12]  E. E. David,et al.  Human communication : a unified view , 1972 .

[13]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[14]  Victor Zue,et al.  JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..

[15]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[16]  John Makhoul,et al.  Speech processing at BBN , 2006, IEEE Annals of the History of Computing.

[17]  Giuseppe Riccardi,et al.  How may I help you? , 1997, Speech Commun..

[18]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[19]  Alex Waibel,et al.  Readings in speech recognition , 1990 .

[20]  Lawrence R. Rabiner,et al.  Automatic Speech Attribute Transcription (ASAT) - The Front End Processor , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[21]  Hsiao-Wuen Hon,et al.  Large-vocabulary speaker-independent continuous speech recognition using HMM , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[22]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[23]  Walt Detmar Meurers,et al.  Encyclopedia of Language and Linguistics , 2006 .

[24]  Richard M. Schwartz,et al.  The BBN BYBLOS Continuous Speech Recognition System , 1989, HLT.

[25]  Homer Dudley,et al.  A Synthetic Speaker , 1939, Science.

[26]  Jay G. Wilpon,et al.  Applications of voice-processing technology in telecommunications , 1994 .

[27]  Steve Young,et al.  The HTK book , 1995 .

[28]  George M. White,et al.  Speech Recognition: A Tutorial Overview , 1976, Computer.

[29]  Waveforms Hisashi Wakita Direct Estimation of the Vocal Tract Shape by Inverse Filtering of Acoustic Speech , 1973 .

[30]  Chin-Hui Lee,et al.  Automatic recognition of keywords in unconstrained speech using hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[31]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[32]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition: Advanced Topics , 1999 .

[33]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[34]  L. R. Rabiner,et al.  On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition , 1983, The Bell System Technical Journal.

[35]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[36]  J. Forgie,et al.  Results Obtained from a Vowel Recognition Computer Program , 1959 .

[37]  T. K. Vintsyuk Speech discrimination by dynamic programming , 1968 .

[38]  D. R. Reddy An approach to computer speech recognition by direct analysis of the speech wave , 1966 .

[39]  John Makhoul,et al.  Context-dependent modeling for acoustic-phonetic recognition of continuous speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[40]  Bruce Lowerre,et al.  The Harpy speech understanding system , 1990 .

[41]  Lalit R. Bahl,et al.  Design of a linguistic statistical decoder for the recognition of continuous speech , 1975, IEEE Trans. Inf. Theory.

[42]  H. Fletcher The nature of speech and its interpretation , 1922 .

[43]  Victor Zue,et al.  PEGASUS: A Spoken Language Interface for On-Line Air Travel Planning I , 1994, HLT.

[44]  N. G. Zagoruyko,et al.  Automatic recognition of 200 words , 1970 .

[45]  J. Makhoul Spectral analysis of speech by linear prediction , 1973 .

[46]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[47]  A. Gray,et al.  Distance measures for speech processing , 1976 .

[48]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[49]  K. Davis,et al.  Automatic Recognition of Spoken Digits , 1952 .

[50]  Geoffrey Zweig,et al.  The IBM 2004 conversational telephony system for rich transcription , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[51]  T. B. Martin,et al.  SPEECH RECOGNITION BY FEATURE-ABSTRACTION TECHNIQUES. , 1964 .

[52]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[53]  S. K. Das,et al.  Issues in Practical Large Vocabulary Isolated Word Recognition: The IBM Tangora System , 1996 .

[54]  Wayne H. Ward,et al.  Speech recognition , 1997 .

[55]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[56]  P. Denes,et al.  The design and operation of the mechanical speech recognizer at University College London , 1959 .

[57]  Aaron E. Rosenberg,et al.  Speaker-independent recognition of isolated words using clustering techniques , 1979 .

[58]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[59]  S. Seneff A joint synchrony/mean-rate model of auditory speech processing , 1990 .