Speaker normalization and adaptation using second-order connectionist networks

A method for speaker normalization and adaption using connectionist networks is developed. A speaker-specific linear transformation of observations of the speech signal is computed using second-order network units. Classification is accomplished by a multilayer feedforward network that operates on the normalized speech data. The network is adapted for a new talker by modifying the transformation parameters while leaving the classifier fixed. This is accomplished by backpropagating classification error through the classifier to the second-order transformation units. This method was evaluated for the classification of ten vowels for 76 speakers using the first two formant values of the Peterson-Barney data. The results suggest that rapid speaker adaptation resulting in high classification accuracy can be accomplished by this method.

[1]  Harvey M. Sussman,et al.  A neuronal model of vowel normalization and representation , 1986, Brain and Language.

[2]  D. Shankweiler,et al.  What information enables a listener to map a talker's vowel space? , 1974, The Journal of the Acoustical Society of America.

[3]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[4]  J. Mullennix,et al.  Some effects of talker variability on spoken word recognition. , 1989, The Journal of the Acoustical Society of America.

[5]  Raymond L. Watrous Current status of Peterson-Barney vowel formant data. , 1991, The Journal of the Acoustical Society of America.

[6]  D. Friedman On the dimensionality of steady-state vowel normalization , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  P. Ladefoged Three areas of experimental phonetics , 1967 .

[8]  D. Shankweiler,et al.  What information enables a listener to map a talker's vowel space? , 1976, The Journal of the Acoustical Society of America.

[9]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .

[10]  W. Strange Evolving theories of vowel perception. , 1987, The Journal of the Acoustical Society of America.

[11]  J. C. Steinberg,et al.  Toward the Specification of Speech , 1950 .

[12]  W. A. Ainsworth,et al.  Intrinsic and Extrinsic Factors in Vowel Judgements , 1975 .

[13]  S. F. Disner Evaluation of vowel normalization procedures. , 1980, The Journal of the Acoustical Society of America.

[14]  Stephen Cox,et al.  Unsupervised speaker adaptation by probabilistic spectrum fitting , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[15]  Geoffrey E. Hinton A Parallel Computation that Assigns Canonical Object-Based Frames of Reference , 1981, IJCAI.

[16]  Steven J. Nowlan,et al.  Maximum Likelihood Competitive Learning , 1989, NIPS.

[17]  L. Gerstman Classification of self-normalized vowels , 1968 .

[18]  Raymond L. Watrous Context-modulated vowel discrimination using connectionist networks☆ , 1991 .

[19]  Alex Waibel,et al.  The Meta-Pi network: connectionist rapid adaptation for high-performance multi-speaker phoneme recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[20]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .