Structural Bayesian Linear Regression for Hidden Markov Models

Linear regression for Hidden Markov Model (HMM) parameters is widely used for the adaptive training of time series pattern analysis especially for speech processing. The regression parameters are usually shared among sets of Gaussians in HMMs where the Gaussian clusters are represented by a tree. This paper realizes a fully Bayesian treatment of linear regression for HMMs considering this regression tree structure by using variational techniques. This paper analytically derives the variational lower bound of the marginalized log-likelihood of the linear regression. By using the variational lower bound as an objective function, we can algorithmically optimize the tree structure and hyper-parameters of the linear regression rather than heuristically tweaking them as tuning parameters. Experiments on large vocabulary continuous speech recognition confirm the generalizability of the proposed approach, especially when the amount of adaptation data is limited.

[1]  Masakiyo Fujimoto,et al.  Feature space variational Bayesian linear regression and its combination with model space VBLR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Lin-Shan Lee,et al.  Fast speaker adaptation using eigenspace-based maximum likelihood linear regression , 2000, INTERSPEECH.

[3]  Tony Jebara,et al.  Machine learning: Discriminative and generative , 2006 .

[4]  Takaaki Hori NTT Speech recognizer with OutLook On the Next generation : SOLON , 2004 .

[5]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[6]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[7]  Jen-Tzung Chien Quasi-Bayes linear regression for sequential learning of hidden Markov models , 2002, IEEE Trans. Speech Audio Process..

[8]  Fabio Valente,et al.  Variational Bayesian GMM for speech recognition , 2003, INTERSPEECH.

[9]  Hagai Attias,et al.  Inferring Parameters and Structure of Latent Variable Models by Variational Bayes , 1999, UAI.

[10]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[11]  Satoshi Takahashi,et al.  Weighted distance measures for efficient reduction of Gaussian mixture components in HMM-based acoustic model , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Chin-Hui Lee,et al.  Maximum a posteriori linear regression for hidden Markov model adaptation , 1999, EUROSPEECH.

[13]  Vassilios Digalakis,et al.  Speaker adaptation using constrained estimation of Gaussian mixtures , 1995, IEEE Trans. Speech Audio Process..

[14]  Zhijian Ou,et al.  Variational nonparametric Bayesian Hidden Markov Model , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Tatsuya Kawahara,et al.  USING THE CORPUS OF SPONTANEOUS JAPANESE , 2003 .

[16]  Naonori Ueda,et al.  Application of Variational Bayesian Approach to Speech Recognition , 2002, NIPS.

[17]  Chin-Hui Lee,et al.  A maximum-likelihood approach to stochastic matching for robust speech recognition , 1996, IEEE Trans. Speech Audio Process..

[18]  Mark J. F. Gales,et al.  Incremental Adaptation using Bayesian Inference , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[19]  Jen-Tzung Chien,et al.  Improved Bayesian learning of hidden Markov models for speaker adaptation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Tetsunori Kobayashi,et al.  A regularized discriminative training method of acoustic models derived by minimum relative entropy discrimination , 2010, INTERSPEECH.

[21]  Jj Odell,et al.  The Use of Context in Large Vocabulary Speech Recognition , 1995 .

[22]  Biing-Hwang Juang,et al.  Bayesian linear regression for Hidden Markov Model based on optimizing variational bounds , 2011, 2011 IEEE International Workshop on Machine Learning for Signal Processing.

[23]  Chin-Hui Lee,et al.  Structural maximum a posteriori linear regression for fast HMM adaptation , 2002, Comput. Speech Lang..

[24]  Shinji Watanabe Acoustic model adaptation based on coarse/fine training of transfer vectors and its application to a speaker adaptation task , 2004, INTERSPEECH.

[25]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[26]  Samy Bengio,et al.  On transforming statistical models for non-frontal face verification , 2006, Pattern Recognit..

[27]  Naonori Ueda,et al.  Variational bayesian estimation and clustering for speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[28]  Takuya Maekawa,et al.  Unsupervised Activity Recognition with User's Physical Characteristics Data , 2011, 2011 15th Annual International Symposium on Wearable Computers.

[29]  Naonori Ueda,et al.  Bayesian model search for mixture models based on optimizing variational bounds , 2002, Neural Networks.

[30]  Qiang Huo,et al.  On adaptive decision rules and decision parameter adaptation for automatic speech recognition , 2000, Proceedings of the IEEE.

[31]  Panu Somervuo Comparison of ML, MAP, and VB based acoustic models in large vocabulary speech recognition , 2004, INTERSPEECH.

[32]  James T. Kwok,et al.  Kernel eigenvoice speaker adaptation , 2005, IEEE Transactions on Speech and Audio Processing.

[33]  Mark J. F. Gales,et al.  Variance compensation within the MLLR framework , 1996 .

[34]  Masakiyo Fujimoto,et al.  Speaker Adaptation Using Variational Bayesian Linear Regression in Normalized Feature Space , 2012, INTERSPEECH.

[35]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[36]  Koichi Shinoda Acoustic Model Adaptation for Speech Recognition , 2010, IEICE Trans. Inf. Syst..

[37]  Chin-Hui Lee,et al.  A structural Bayes approach to speaker adaptation , 2001, IEEE Trans. Speech Audio Process..

[38]  Shinji Watanabe,et al.  Static and Dynamic Variance Compensation for Recognition of Reverberant Speech With Dereverberation Preprocessing , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[40]  Keiichi Tokuda,et al.  Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[41]  A. Dawid Some matrix-variate distribution theory: Notational considerations and a Bayesian application , 1981 .

[42]  Satoshi Nakamura,et al.  Automatic generation of non-uniform HMM structures based on variational Bayesian approach , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[43]  Andreas Stolcke,et al.  MLLR transforms as features in speaker recognition , 2005, INTERSPEECH.

[44]  Hitoshi Isahara,et al.  Spontaneous Speech Corpus of Japanese , 2000, LREC.

[45]  Heiga Zen,et al.  Bayesian context clustering using cross valid prior distribution for HMM-based speech recognition , 2008, INTERSPEECH.

[46]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.