Deep Belief Networks for phone recognition

Hidden Markov Models (HMMs) have been the state-of-the-art techniques for acoustic modeling despite their unrealistic independence assumptions and the very limited representational capacity of their hidden states. There are many proposals in the research community for deeper models that are capable of modeling the many types of variability present in the speech generation p r cess. Deep Belief Networks (DBNs) have recently proved to be very effective fo r a variety of machine learning problems and this paper applies DBNs to acous ti modeling. On the standard TIMIT corpus, DBNs consistently outperform ot her techniques and the best DBN achieves a phone error rate (PER) of 23.0% on the T IMIT core test set.

[1]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[2]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[3]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[4]  L Deng,et al.  Structural design of hidden Markov model speech recognizer using multivalued phonetic features: comparison with segmental speech units. , 1992, The Journal of the Acoustical Society of America.

[5]  Li Deng,et al.  A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal , 1992, Signal Process..

[6]  Mari Ostendorf,et al.  Fast algorithms for phone classification and recognition using segment-based models , 1992, IEEE Trans. Signal Process..

[7]  Li Deng,et al.  A stochastic model of speech incorporating hierarchical nonstationarity , 1993, IEEE Trans. Speech Audio Process..

[8]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[9]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[10]  James R. Glass,et al.  Heterogeneous measurements and multiple classifiers for speech recognition , 1998, ICSLP.

[11]  Hynek Hermansky,et al.  TRAPS - classifiers of temporal patterns , 1998, ICSLP.

[12]  Francis Jack Smith,et al.  Improved phone recognition using Bayesian triphone models , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[13]  Jeff A. Bilmes,et al.  Buried Markov models for speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[14]  Mari Ostendorf THE ‘ BEADS-ONA-STRING ’ MODEL OF SPEECH , 1999 .

[15]  Li Deng,et al.  An overlapping-feature-based phonological model incorporating linguistic constraints: applications to speech recognition. , 2002, The Journal of the Acoustical Society of America.

[16]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[17]  James R. Glass A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..

[18]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[19]  Simon King,et al.  Structural representation of speech for phonetic classification , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[20]  Pavel Matejka,et al.  Hierarchical Structures of Neural Networks for Phoneme Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[21]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[22]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[23]  Dong Yu,et al.  Structured speech modeling , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[25]  Eric Fosler-Lussier,et al.  Combining phonetic attributes using conditional random fields , 2006, INTERSPEECH.

[26]  Lawrence K. Saul,et al.  Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[27]  Dong Yu,et al.  Use of Differential Cepstra as Acoustic Features in Hidden Trajectory Modeling for Phonetic Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[28]  Aren Jansen,et al.  A hierarchical point process model for speech recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Volodymyr Mnih,et al.  CUDAMat: a CUDA-based matrix class for Python , 2009 .

[30]  Geoffrey E. Hinton,et al.  Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[31]  Geoffrey E. Hinton,et al.  3D Object Recognition with Deep Belief Nets , 2009, NIPS.

[32]  Hermann Ney,et al.  A Deep Learning Approach to Machine Transliteration , 2009, WMT@EACL.

[33]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[34]  Steve Renals,et al.  Speech Recognition Using Augmented Conditional Random Fields , 2009, IEEE Transactions on Audio, Speech, and Language Processing.