Fast discriminative training for sequential observations with application to speaker identification

This paper presents a fast discriminative training algorithm for sequences of observations. It considers a sequence of feature vectors as one single composite token in training or testing. In contrast to the traditional EM algorithm, this algorithm is derived from a discriminative objective, aiming at directly minimizing the recognition error. Compared to the gradient-descent algorithms for discriminative training, this algorithm invokes a mild assumption which leads to closed-form formulas for re-estimation, rather than relying on gradient search, without sacrificing the algorithmic rigor. As such, it is in general much faster than a descent based algorithm and does not need to determine the learning rate or step size. Our experiment shows that the proposed algorithm reduces error rate by 14.65, 66.46, and 100.00% for 1, 5, and 10 seconds of testing data respectively, in a speaker identification application.

[1]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[2]  Qiru Zhou,et al.  Robust endpoint detection and energy normalization for real-time speech and speaker recognition , 2002, IEEE Trans. Speech Audio Process..

[3]  Biing-Hwang Juang,et al.  A new algorithm for fast discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..