Speaker Verification Robust to Talking Style Variation Using Multiple Kernel Learning Based on Conditional Entropy Minimization

We developed a new speaker verification system that is robust to intra-speaker variation. There is a strong likelihood that intraspeaker variations will occur due to changes in talking styles, the periods when an individual speaks, and so on. It is well known that such variation generally degrades the performance of speaker verification systems. To solve this problem, we applied multiple kernel learning (MKL) based on conditional entropy minimization, which impose the data to be compactly aggregated for each speaker class and ensure that the different speaker classes were far apart from each other. Experimental results showed that the proposed speaker verification system achieved a robust performance to intra-speaker variation derived from changes in the talking styles compared to the conventional maximum margin-based system.

[1]  Tetsuji Ogawa,et al.  Speaker recognition using multiple kernel learning based on conditional entropy minimization , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Samy Bengio,et al.  A kernel trick for sequences applied to text-independent speaker verification systems , 2007, Pattern Recognit..

[3]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[4]  Melanie Hilario,et al.  Margin and Radius Based Multiple Kernel Learning , 2009, ECML/PKDD.

[5]  Hagai Aronowitz,et al.  Modeling intra-speaker variability for speaker recognition , 2005, INTERSPEECH.

[6]  Andreas Stolcke,et al.  MLLR transforms as features in speaker recognition , 2005, INTERSPEECH.

[7]  Jacob Goldberger,et al.  ICA based on a Smooth Estimation of the Differential Entropy , 2008, NIPS.

[8]  Satoshi Nakamura,et al.  AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition , 2005, IEICE Trans. Inf. Syst..

[9]  Sridha Sridharan,et al.  Factor analysis subspace estimation for speaker verification with short utterances , 2008, INTERSPEECH.

[10]  Hideitsu Hino,et al.  Multiple Kernel Learning by Conditional Entropy Minimization , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[11]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[12]  Mark J. F. Gales,et al.  Multiple kernel learning for speaker verification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.