A multinetwork time-delay-neural-network (TDNN)-based connectionist architecture that allows multispeaker phoneme discrimination (/b,d,g/) to be performed at the speaker-dependent recognition rate of 98.4% is presented. The overall network gates the phonemic decisions of modules trained on individual speakers to form its overall classification decision. By dynamically adapting to the input speech and focusing on a combination of speaker-specific modules, the network outperforms a single TDNN trained on the speech of all six speakers (95.9%). To train this network a form of multiplicative connection called the Meta-Pi connection is developed. It is illustrated how the Mega-Pi paradigm implements a dynamically adaptive Bayesian MAP classifier. It learns-without supervision-to recognize the speech of one particular speaker (99.8%) using a dynamic combination of internal models of other speakers exclusively. The Meta-Pi model is a viable basis for a connectionist speech recognition system that can rapidly adapt to new speakers and varying speaker dialects.<<ETX>>
[1]
Geoffrey E. Hinton.
Connectionist Learning Procedures
,
1989,
Artif. Intell..
[2]
Alexander H. Waibel,et al.
Connectionist Architectures for Multi-Speaker Phoneme Recognition
,
1989,
NIPS.
[3]
Alex Waibel,et al.
Consonant recognition by modular construction of large phonemic time-delay neural networks
,
1989,
International Conference on Acoustics, Speech, and Signal Processing,.
[4]
Geoffrey E. Hinton,et al.
Phoneme recognition using time-delay neural networks
,
1989,
IEEE Trans. Acoust. Speech Signal Process..
[5]
Kevin J. Lang.
A time delay neural network architecture for speech recognition
,
1989
.
[6]
Geoffrey E. Hinton,et al.
Learning representations by back-propagating errors
,
1986,
Nature.