Auditory models with Kohonen SOFM and LVQ for speaker independent phoneme recognition

Neural networks that employed unsupervised learning were used on the output of two different models of the auditory periphery to perform phoneme recognition. Experiments which compared the performance of these two auditory model representations to mel-cepstral coefficients showed that the auditory models performed significantly better in terms of phoneme recognition accuracy under the conditions tested (high signal-to-noise and a large database of speakers). However, the three representations made different types of broad class recognition errors. The Patterson auditory model representation performed best with the highest overall phoneme and broad class performance.<<ETX>>

[1]  V.W. Zue,et al.  The use of speech knowledge in automatic speech recognition , 1985, Proceedings of the IEEE.

[2]  Teuvo Kohonen,et al.  The 'neural' phonetic typewriter , 1988, Computer.

[3]  K. Payton Vowel processing by a model of the auditory periphery: A comparison to eighth‐nerve responses , 1988 .

[4]  Kai-Fu Lee,et al.  Automatic Speech Recognition , 1989 .

[5]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[6]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory, Third Edition , 1989, Springer Series in Information Sciences.

[7]  Timothy R. Anderson,et al.  A comparison of auditory models for speaker independent phoneme recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.