Phone recognition using Restricted Boltzmann Machines

For decades, Hidden Markov Models (HMMs) have been the state-of-the-art technique for acoustic modeling despite their unrealistic independence assumptions and the very limited representational capacity of their hidden states. Conditional Restricted Boltzmann Machines (CRBMs) have recently proved to be very effective for modeling motion capture sequences and this paper investigates the application of this more powerful type of generative model to acoustic modeling. On the standard TIMIT corpus, one type of CRBM outperforms HMMs and is comparable with the best other methods, achieving a phone error rate (PER) of 26.7% on the TIMIT core test set.

[1]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[2]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[3]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[4]  James R. Glass,et al.  Heterogeneous measurements and multiple classifiers for speech recognition , 1998, ICSLP.

[5]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[6]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[7]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[8]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[9]  Eric Fosler-Lussier,et al.  Combining phonetic attributes using conditional random fields , 2006, INTERSPEECH.

[10]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[11]  Dong Yu,et al.  Use of Differential Cepstra as Acoustic Features in Hidden Trajectory Modeling for Phonetic Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[12]  Lawrence K. Saul,et al.  Comparison of Large Margin Training to Other Discriminative Methods for Phonetic Recognition by Hidden Markov Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[13]  Geoffrey E. Hinton,et al.  3D Object Recognition with Deep Belief Nets , 2009, NIPS.

[14]  Steve Renals,et al.  Speech Recognition Using Augmented Conditional Random Fields , 2009, IEEE Transactions on Audio, Speech, and Language Processing.