Phoneme Recognition: Neural Networks vs

neme recognition which is characterized by two important properties: 1.) Using a 3 layer arrangement of simple computing units, it can represent arbitrary nonlinear decision surfaces. The TDNN learns these decision surfaces automatically using error back-propagatioii[l]. 2.) he time-delay arrangement enables the network to discover acoustichonetic features and the temporal relationships between them indeendent of position in time and hence not blurred by temporal shifts in the input. For comparison, several discrete Hidden Markov Models (HMM) were trained to perform the same task, i.e., the speakerdependent recognition of the phonemes "B", "D", and "G" extracted We show that the TDNN "invented" well-known acoustic-phonetic

[1]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[2]  Michael Picheny,et al.  Some experiments with large-vocabulary isolated-word sentence recognition , 1984, ICASSP.

[3]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[4]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[5]  John Makhoul,et al.  BYBLOS: The BBN continuous speech recognition system , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  J J Hopfield,et al.  Neural computation by concentrating information in time. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[7]  D Zipser,et al.  Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.

[8]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..