Multi-State Time Delay Networks for Continuous Speech Recognition

We present the "Multi-State Time Delay Neural Network" (MS-TDNN) as an extension of the TDNN to robust word recognition. Unlike most other hybrid methods, the MS-TDNN embeds an alignment search procedure into the connectionist architecture, and allows for word level supervision. The resulting system has the ability to manage the sequential order of subword units, while optimizing for the recognizer performance. In this paper we present extensive new evaluations of this approach over speaker-dependent and speaker-independent connected alphabet.