Latent-Descriptor Clustering for Unsupervised POS Induction

We present a novel approach to distributionalonly, fully unsupervised, POS tagging, based on an adaptation of the EM algorithm for the estimation of a Gaussian mixture. In this approach, which we call Latent-Descriptor Clustering (LDC), word types are clustered using a series of progressively more informative descriptor vectors. These descriptors, which are computed from the immediate left and right context of each word in the corpus, are updated based on the previous state of the cluster assignments. The LDC algorithm is simple and intuitive. Using standard evaluation criteria for unsupervised POS tagging, LDC shows a substantial improvement in performance over state-of-the-art methods, along with a several-fold reduction in computational cost.

[1]  Mark Johnson,et al.  Why Doesn’t EM Find Good HMM POS-Taggers? , 2007, EMNLP.

[2]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[3]  Hinrich Schütze,et al.  Distributional Part-of-Speech Tagging , 1995, EACL.

[4]  John DeNero,et al.  Painless Unsupervised Learning with Features , 2010, NAACL.

[5]  Ari Rappoport,et al.  Improved Unsupervised POS Induction through Prototype Discovery , 2010, ACL.

[6]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[7]  Jun'ichi Tsujii,et al.  Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data , 2005, HLT.

[8]  Ben Taskar,et al.  Posterior vs Parameter Sparsity in Latent Variable Models , 2009, NIPS.

[9]  Eugene Charniak,et al.  Evaluating Unsupervised Part-of-Speech Tagging for Grammar Induction , 2008, COLING.

[10]  Alexander Clark Unsupervised induction of stochastic context-free grammars using distributional clustering , 2001, CoNLL.

[11]  Jianfeng Gao,et al.  A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers , 2008, EMNLP.

[12]  Dan Klein,et al.  Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[13]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[14]  Mark Johnson,et al.  SVD and Clustering for Unsupervised POS Tagging , 2010, ACL.

[15]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[16]  Ari Rappoport,et al.  Improved Unsupervised POS Induction Using Intrinsic Clustering Quality and a Zipfian Constraint , 2010, CoNLL.

[17]  Alexander Clark,et al.  Combining Distributional and Morphological Information for Part of Speech Induction , 2003, EACL.

[18]  Kevin Knight,et al.  Minimized Models for Unsupervised Part-of-Speech Tagging , 2009, ACL/IJCNLP.

[19]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[20]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.