论文信息 - Application of Deep Belief Networks for Natural Language Understanding

Application of Deep Belief Networks for Natural Language Understanding

Applications of Deep Belief Nets (DBN) to various problems have been the subject of a number of recent studies ranging from image classification and speech recognition to audio classification. In this study we apply DBNs to a natural language understanding problem. The recent surge of activity in this area was largely spurred by the development of a greedy layer-wise pretraining method that uses an efficient learning algorithm called Contrastive Divergence (CD). CD allows DBNs to learn a multi-layer generative model from unlabeled data and the features discovered by this model are then used to initialize a feed-forward neural network which is fine-tuned with backpropagation. We compare a DBN-initialized neural network to three widely used text classification algorithms: Support Vector Machines (SVM), boosting and Maximum Entropy (MaxEnt). The plain DBN-based model gives a call-routing classification accuracy that is equal to the best of the other models. However, using additional unlabeled data for DBN pre-training and combining DBN-based learned features with the original features provides significant gains over SVMs, which, in turn, performed better than both MaxEnt and Boosting.

Geoffrey E. Hinton | Ruhi Sarikaya | Anoop Deoras | Anoop Deoras | R. Sarikaya

[1] P. J. Price,et al. Evaluation of Spoken Language Systems: the ATIS Domain , 1990, HLT.

[2] Giuseppe Riccardi,et al. How may I help you? , 1997, Speech Commun..

[3] John D. Lafferty,et al. Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[4] Ronald Rosenfeld,et al. A survey of smoothing techniques for ME models , 2000, IEEE Trans. Speech Audio Process..

[5] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6] Michael I. Jordan,et al. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[7] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[9] Gökhan Tür,et al. Optimizing SVMs for complex call classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[10] Geoffrey E. Hinton,et al. Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[11] Yoram Singer,et al. BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[12] Vaibhava Goel,et al. Exploiting unlabeled data using multiple classifiers for improved natural language call-routing , 2005, INTERSPEECH.

[13] Alex Acero,et al. Discriminative models for spoken language understanding , 2006, INTERSPEECH.

[14] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[15] Geoffrey E. Hinton. Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[16] Giuseppe Riccardi,et al. Generative and discriminative algorithms for spoken language understanding , 2007, INTERSPEECH.

[17] Yoshua Bengio,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[18] Geoffrey E. Hinton,et al. Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine , 2010, NIPS.

[19] Gokhan Tur,et al. Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[20] Bhuvana Ramabhadran,et al. Deep belief nets for natural language call-routing , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21] Geoffrey E. Hinton. A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.