Extracting Phrase Patterns with Minimum Redundancy for Unsupervised Speaker Role Classification

This paper addresses the problem of learning phrase patterns for unsupervised speaker role classification. Phrase patterns are automatically extracted from large corpora, and redundant patterns are removed via a graph pruning algorithm. In experiments on English and Mandarin talk shows, the use of phrase patterns results in an increase of role classification accuracy over n-gram lexical features, and more compact phrase pattern lists are obtained due to the redundancy removal.

[1]  Mari Ostendorf,et al.  Unsupervised broadcast conversation speaker role labeling , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Ming Zhou,et al.  Detecting Erroneous Sentences using Automatically Mined Sequential Patterns , 2007, ACL.

[3]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[5]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[6]  Mari Ostendorf,et al.  Variable n-grams and extensions for conversational speech language modeling , 2000, IEEE Trans. Speech Audio Process..

[7]  Yang Liu,et al.  Initial Study on Automatic Identification of Speaker Role in Broadcast News Speech , 2006, NAACL.

[8]  Julia Hirschberg,et al.  The Rules Behind Roles: Identifying Speaker Role in Radio Broadcasts , 2000, AAAI/IAAI.