A comparative study of noise estimation algorithms for VTS-based robust speech recognition

We conduct a comparative study to investigate two noise estimation approaches for robust speech recognition using vector Taylor series (VTS) developed in the past few years. The first approach, iterative root finding (IRF), directly differentiates the EM auxiliary function and approximates the root of the derivative function through recursive refinements. The second approach, twofold expectation maximization (TEM), estimates noise distributions by regarding them as hidden variables in a modified EM fashion. Mathematical derivations reveal the substantial connection between the two approaches. Two experiments are performed in evaluating the performance and convergence rate of the algorithms. The first is to fit a GMM model to artificially corrupted samples that are generated through Monte Carlo simulation. The second is to perform speech recognition on the Aurora 2 database. Index Terms: Robust speech recognition, vector Taylor series, noise estimation

[1]  Pedro J. Moreno,et al.  Speech recognition in noisy environments , 1996 .

[2]  Douglas A. Reynolds,et al.  Integrated models of signal and background with application to speaker identification in noise , 1994, IEEE Trans. Speech Audio Process..

[3]  Li Deng,et al.  HMM adaptation using vector taylor series for noisy speech recognition , 2000, INTERSPEECH.

[4]  Yifan Gong,et al.  A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions , 2009, Computer Speech and Language.

[5]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[6]  Li Deng,et al.  Uncertainty decoding with SPLICE for noise robust speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Chong Kwan Un,et al.  Speech recognition in noisy environments using first-order vector Taylor series , 1998, Speech Commun..

[8]  Mark J. F. Gales,et al.  Joint uncertainty decoding for noise robust speech recognition , 2005, INTERSPEECH.

[9]  Biing-Hwang Juang,et al.  On noise estimation for robust speech recognition using vector Taylor series , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Yifan Gong,et al.  Speech recognition in noisy environments: A survey , 1995, Speech Commun..

[11]  Yu Hu,et al.  Irrelevant variability normalization based HMM training using VTS approximation of an explicit model of environmental distortions , 2007, INTERSPEECH.