A study of auditory modeling and processing for speech signals

We study a modified version of a computational model of the human peripheral and central auditory system (Wang, K. and Shamma, S.A., 1995; Yang, X. et al., 1992), and examine the validity of its output from two practical perspectives. One considers the well-known Mel-frequency cepstral coefficients (MFCC) as an approximate representation of the physiology-based early auditory processing result. The other allows the derivation of feature vectors from the dimension expanded cortical response of the central auditory system for use in a conventional phoneme recognition task. In addition to confirming the relevancy of the model under an existing statistical speech recognition framework, we conduct a preliminary study of the cortical response in connection with known physiological studies, to find new possibilities in using the auditory model to perform cognitive functions based on a better understanding of the human auditory system. In particular, the cortical response may be a place-coded data set where sounds are categorized according to the regions containing their most distinguishing features. The results of this study encourage us to develop hierarchical, detection-based methods in which this mechanism may be utilized to simulate a variety of human perceptual and cognitive functions.

[1]  L. Tan,et al.  Distinct brain regions associated with syllable and phoneme , 2003, Human brain mapping.

[2]  Kuansan Wang,et al.  Auditory representations of acoustic signals , 1992, IEEE Trans. Inf. Theory.

[3]  Richard Lippmann,et al.  Speech recognition by machines and humans , 1997, Speech Commun..

[4]  David G. Stork,et al.  Pattern Classification , 1973 .

[5]  Juyang Weng,et al.  Using Discriminant Eigenfeatures for Image Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  J. Rauschecker,et al.  Hierarchical Organization of the Human Auditory Cortex Revealed by Functional Magnetic Resonance Imaging , 2001, Journal of Cognitive Neuroscience.

[7]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[8]  Kuansan Wang,et al.  Spectral shape analysis in the central auditory system , 1995, IEEE Trans. Speech Audio Process..

[9]  Alan C. Evans,et al.  Left‐hemisphere specialization for the processing of acoustic transients , 1997, Neuroreport.

[10]  Kuansan Wang,et al.  Self-normalization and noise-robustness in early auditory representations , 1994, IEEE Trans. Speech Audio Process..