Context-sensitive learning methods for text categorization

Two recently implemented machine-learning algorithms, RIPPER and sleeping-experts for phrases, are evaluated on a number of large text categorization problems. These algorithms both construct classifiers that allow the “context” of a word w to affect how (or even whether) the presence or absence of w will contribute to a classification. However, RIPPER and sleeping-experts differ radically in many other respects: differences include different notions as to what constitutes a context, different ways of combining contexts to construct a classifier, different methods to search for a combination of contexts, and different criteria as to what contexts should be included in such a combination. In spite of these differences, both RIPPER and sleeping-experts perform extremely well across a wide variety of categorization problems, generally outperforming previously applied learning methods. We view this result as a confirmation of the usefulness of classifiers that represent contextual information.

[1]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[2]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[3]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[4]  Avrim Blum Learning boolean functions in an infinite attribute space , 1990, STOC '90.

[5]  G Salton,et al.  Developments in Automatic Text Retrieval , 1991, Science.

[6]  David D. Lewis,et al.  Representation and Learning in Information Retrieval , 1991 .

[7]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[8]  Michael J. Pazzani,et al.  An Investigation of Noise-Tolerant Relational Concept Learning Algorithms , 1991, ML.

[9]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[10]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[11]  William W. Cohen Efficient Pruning Methods for Separate-and-Conquer Rule Learning Systems , 1993, IJCAI.

[12]  Michael J. Pazzani,et al.  HYDRA: A Noise-tolerant Relational Concept Learning Algorithm , 1993, IJCAI.

[13]  R. Mike Cameron-Jones,et al.  FOIL: A Midterm Report , 1993, ECML.

[14]  Sholom M. Weiss,et al.  Towards language independent automated learning of text categorization models , 1994, SIGIR '94.

[15]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[16]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[17]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[18]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[19]  Yiming Yang,et al.  Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.

[20]  James Allan,et al.  The effect of adding relevance information in a relevance feedback environment , 1994, SIGIR '94.

[21]  Johannes Fürnkranz,et al.  Incremental Reduced Error Pruning , 1994, ICML.

[22]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[23]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[24]  Yiming Yang,et al.  An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[25]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Based Algorithms: Results on a Calendar Scheduling Domain , 1995, ICML.

[26]  Kenneth Ward Church,et al.  Poisson mixtures , 1995, Natural Language Engineering.

[27]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[28]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[29]  David D. Lewis,et al.  Text categorization of low quality images , 1995 .

[30]  Gerard Salton,et al.  Optimization of relevance feedback weights , 1995, SIGIR '95.

[31]  William W. Cohen Fast Eeective Rule Induction , 1995 .

[32]  William W. Cohen Text Categorization and Relational Learning , 1995, ICML.

[33]  Hinrich Schütze,et al.  A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[34]  Thorsten Joachims,et al.  WebWatcher : A Learning Apprentice for the World Wide Web , 1995 .

[35]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[36]  J. R. Quinlan,et al.  MDL and Categorical Theories (Continued) , 1995, ICML.

[37]  Michael J. Pazzani,et al.  Learning from hotlists and coldlists: towards a WWW information filtering and seeking agent , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[38]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[39]  Hinrich Schütze,et al.  Method combination for document filtering , 1996, SIGIR '96.

[40]  David A. Hull Stemming algorithms: a case study for detailed evaluation , 1996 .

[41]  Yoram Singer,et al.  Learning to Query the Web , 1996 .

[42]  William W. Cohen Learning Rules that Classify E-Mail , 1996 .

[43]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[44]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[45]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[46]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[47]  Yoram Singer,et al.  Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.