An Alphanet approach to optimising input transformations for continuous speech recognition

The authors extend to continuous speech recognition (CSR) the Alphanet approach to integrating backprop networks and HMM (hidden Markov model)-based isolated word recognition. They present the theory of a method for discriminative training of components of a CSR system, using training data in the form of complete sentences. The derivatives of the discriminative score with respect to the parameters are expressed in terms of the posterior probabilities of state occupancies (gammas) under two conditions called 'clamped' and 'free' because they correspond to the two conditions in Boltzmann machine training. The authors compute these clamped and free gammas using the forward-backward algorithm twice, and use the differences to drive the adaptation of a preprocessing data transformation, which can be thought of as replacing the linear transformation which yields MFCCs, or which normalizes a grand covariance matrix.<<ETX>>