Study of a Fast Discriminative Training Algorithm for Pattern Recognition

Discriminative training refers to an approach to pattern recognition based on direct minimization of a cost function commensurate with the performance of the recognition system. This is in contrast to the procedure of probability distribution estimation as conventionally required in Bayes' formulation of the statistical pattern recognition problem. Currently, most discriminative training algorithms for nonlinear classifier designs are based on gradient-descent (GD) methods for cost minimization. These algorithms are easy to derive and effective in practice, but are slow in training speed and have difficulty selecting the learning rates. To address the problem, we present our study on a fast discriminative training algorithm. The algorithm initializes the parameters by the expectation-maximization (EM) algorithm, and then uses a set of closed-form formulas derived in this paper to further optimize a proposed objective of minimizing error rate. Experiments in speech applications show that the algorithm provides better recognition accuracy in a fewer iterations than the EM algorithm and a neural network trained by hundreds of GD iterations. Although some convergent properties need further research, the proposed objective and derived formulas can benefit further study of the problem

[1]  David Burshtein,et al.  A discriminative training algorithm for hidden Markov models , 2004, IEEE Transactions on Speech and Audio Processing.

[2]  Satoshi Nakamura,et al.  Discriminative training of HMM using maximum normalized likelihood algorithm , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3]  Qi Li,et al.  Principal feature classification , 1997, IEEE Trans. Neural Networks.

[4]  Paul J. Werbos,et al.  The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting , 1994 .

[5]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[6]  Biing-Hwang Juang,et al.  Fast discriminative training for sequential observations with application to speaker identification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[7]  Qiru Zhou,et al.  Robust endpoint detection and energy normalization for real-time speech and speaker recognition , 2002, IEEE Trans. Speech Audio Process..

[8]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[9]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[10]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[11]  Samuel Kaski,et al.  Discriminative components of data , 2005, IEEE Transactions on Neural Networks.

[12]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[14]  Wu Chou,et al.  Discriminant-function-based minimum recognition error rate pattern-recognition approach to speech recognition , 2000, Proceedings of the IEEE.

[15]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[16]  E. M. L. Beale,et al.  Nonlinear Programming: A Unified Approach. , 1970 .

[17]  Jesús Cid-Sueiro,et al.  A universal learning rule that minimizes well-formed cost functions , 2005, IEEE Transactions on Neural Networks.

[18]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[19]  Dimitri Kanevsky,et al.  An inequality for rational functions with applications to some statistical estimation problems , 1991, IEEE Trans. Inf. Theory.

[20]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[21]  Renato De Mori,et al.  High-performance connected digit recognition using maximum mutual information estimation , 1994, IEEE Trans. Speech Audio Process..

[22]  Frank K. Soong,et al.  An auditory system-based feature for robust speech recognition , 2001, INTERSPEECH.

[23]  Mahmood R. Azimi-Sadjadi,et al.  Multi-aspect target discrimination using hidden Markov models and neural networks , 2005, IEEE Transactions on Neural Networks.

[24]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[25]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[26]  Xinghuo Yu,et al.  A general backpropagation algorithm for feedforward neural networks learning , 2002, IEEE Trans. Neural Networks.

[27]  Seiichi Nakagawa,et al.  Discriminative training of GMM using a modified EM algorithm for speaker recognition , 1998, ICSLP.

[28]  Wei Wu,et al.  Deterministic convergence of an online gradient method for BP neural networks , 2005, IEEE Transactions on Neural Networks.

[29]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.