The Meta-Pi network: connectionist rapid adaptation for high-performance multi-speaker phoneme recognition

A multinetwork time-delay-neural-network (TDNN)-based connectionist architecture that allows multispeaker phoneme discrimination (/b,d,g/) to be performed at the speaker-dependent recognition rate of 98.4% is presented. The overall network gates the phonemic decisions of modules trained on individual speakers to form its overall classification decision. By dynamically adapting to the input speech and focusing on a combination of speaker-specific modules, the network outperforms a single TDNN trained on the speech of all six speakers (95.9%). To train this network a form of multiplicative connection called the Meta-Pi connection is developed. It is illustrated how the Mega-Pi paradigm implements a dynamically adaptive Bayesian MAP classifier. It learns-without supervision-to recognize the speech of one particular speaker (99.8%) using a dynamic combination of internal models of other speakers exclusively. The Meta-Pi model is a viable basis for a connectionist speech recognition system that can rapidly adapt to new speakers and varying speaker dialects.<<ETX>>