On noise estimation for robust speech recognition using vector Taylor series

In this paper, we propose a novel noise variance estimation method using the fixed point method for the VTS-based robust speech recognition. Noise parameters are re-estimated over a given utterance using an EM algorithm. The derivative of the auxiliary function with respect to the noise variance is resolved, and the fixed point algorithm estimates the noise variance by recursively approximating the root of the resulting derivative. The method leads to a re-estimation formula with a flavor like the standard ML variance estimation, and the iteration procedure is step-size free. We also investigate improving the noise estimation for efficient VTS adaptation. Several fast noise estimation methods are examined including estimation from non-speech areas and incremental adaptation. In the evaluation over Aurora 2 database, the proposed noise variance estimation method obtains a significant improvement in recognition accuracy over the method using sample variance. Further experiments show that the VTS ML estimation over non-speech areas is an effective fast adaptation method. The final refined approach achieves 8.75% WER, 13% relative improvement over the conventional VTS adaptation.

[1]  Pedro J. Moreno,et al.  Speech recognition in noisy environments , 1996 .

[2]  Yifan Gong,et al.  A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions , 2009, Computer Speech and Language.

[3]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[4]  Mark J. F. Gales,et al.  Joint uncertainty decoding for noise robust speech recognition , 2005, INTERSPEECH.

[5]  Satoshi Takahashi,et al.  Jacobian approach to fast acoustic model adaptation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Li Deng,et al.  HMM adaptation using vector taylor series for noisy speech recognition , 2000, INTERSPEECH.

[7]  Yifan Gong,et al.  High-performance hmm adaptation with joint compensation of additive and convolutive distortions via Vector Taylor Series , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[8]  Mark J. F. Gales,et al.  Incremental predictive and adaptive noise compensation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Li Deng,et al.  Uncertainty decoding with SPLICE for noise robust speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.