BoosTexter: A System for Multiclass Multi-label Text Categorization

This work focuses on algorithms which learn from examples to perform multiclass text and speech categorization tasks. We rst show how to extend the standard notion of classiication by allowing each instance to be associated with multiple labels. We then discuss our approach for multiclass multi-label text categorization which is based on a new and improved family of boosting algorithms. We describe in detail an implementation, called BoosTexter, of the new boosting algorithms for text categorization tasks. We present results comparing the performance of BoosTexter and a number of other text-categorization algorithms on a variety of tasks. We conclude by describing the application of our system to automatic call-type identiication from unconstrained spoken customer responses.

[1]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[2]  Hinrich Schütze,et al.  Method combination for document filtering , 1996, SIGIR '96.

[3]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[4]  G Salton,et al.  Developments in Automatic Text Retrieval , 1991, Science.

[5]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[6]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[7]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[8]  Giuseppe Riccardi,et al.  Automatic acquisition of salient grammar fragments for call-type classification , 1997, EUROSPEECH.

[9]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[10]  Corinna Cortes,et al.  Boosting Decision Trees , 1995, NIPS.

[11]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[12]  David W. Opitz,et al.  An Empirical Evaluation of Bagging and Boosting , 1997, AAAI/IAAI.

[13]  David D. Lewis,et al.  Text categorization of low quality images , 1995 .

[14]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Based Algorithms: Results on a Calendar Scheduling Domain , 1995, ICML.

[15]  Sholom M. Weiss,et al.  Towards language independent automated learning of text categorization models , 1994, SIGIR '94.

[16]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[17]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[18]  Yiming Yang,et al.  Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.

[19]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[20]  David D. Lewis,et al.  Representation and Learning in Information Retrieval , 1991 .

[21]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[22]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[23]  Robert E. Schapire,et al.  Using output codes to boost multiclass learning problems , 1997, ICML.

[24]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[25]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[26]  William W. Cohen Fast Eeective Rule Induction , 1995 .

[27]  Giuseppe Riccardi,et al.  How may I help you? , 1997, Speech Commun..