Continuous speech recognition via centisecond acoustic states

Continuous speech was treated as if produced by a finite‐state machine making a transition every centisecond. The observable output from state transitions was considered to be a power spectrum—a probabilistic function of the target state of each transition. Using this model, observed sequences of power spectra from real speech were decoded as sequences of acoustic states by means of the Viterbi trellis algorithm. The finite‐state machine used as a representation of the speech source was composed of machines representing words, combined according to a “language model.” When trained to the voice of a particular speaker, the decoder recognized seven‐digit telephone numbers correctly 96% of the time, with a better than 99% per‐digit accuracy. Results for other tests of the system, including syllable and phoneme recognition, will also be given.