Exploiting Hierarchy in Text Categorization

With the recent dramatic increase in electronic access to documents, text categorization—the task of assigning topics to a given document—has moved to the center of the information sciences and knowledge management. This article uses the structure that is present in the semantic space of topics in order to improve performance in text categorization: according to their meaning, topics can be grouped together into “meta-topics”, e.g., gold, silver, and copper are all metals. The proposed architecture matches the hierarchical structure of the topic space, as opposed to a flat model that ignores the structure. It accommodates both single and multiple topic assignments for each document. Its probabilistic interpretation allows its predictions to be combined in a principled way with information from other sources. The first level of the architecture predicts the probabilities of the meta-topic groups. This allows the individual models for each topic on the second level to focus on finer discriminations within the group. Evaluating the performance of a two-level implementation on the Reuters-22173 testbed of newswire articles shows the most significant improvement for rare classes.

[1]  James V. Candy,et al.  Adaptive and Learning Systems for Signal Processing, Communications, and Control , 2006 .

[2]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[3]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[4]  Yiming Yang,et al.  A Linear Least Squares Fit Mapping Method for Information Retrieval From Natural Language Texts , 1992, COLING.

[5]  Thomas Hofmann,et al.  Learning and representing topic-a hierarchical mixture model for word occurences in document databas , 1998 .

[6]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[7]  James Allan,et al.  The effect of adding relevance information in a relevance feedback environment , 1994, SIGIR '94.

[8]  David A. Hull Improving text retrieval for the routing problem using latent semantic indexing , 1994, SIGIR '94.

[9]  Ido Dagan,et al.  Keyword-Based Browsing and Analysis of Large Document Sets , 1996 .

[10]  Benjamin Van Roy,et al.  Solving Data Mining Problems Through Pattern Recognition , 1997 .

[11]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[12]  David A. Hull Using statistical testing in the evaluation of retrieval experiments , 1993, SIGIR.

[13]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[14]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[15]  David D. Lewis,et al.  Representation and Learning in Information Retrieval , 1991 .

[16]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[17]  David D. Lewis,et al.  Text categorization of low quality images , 1995 .

[18]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[19]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[20]  Sholom M. Weiss,et al.  Towards language independent automated learning of text categorization models , 1994, SIGIR '94.

[21]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[22]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[23]  Robert A. Lordo,et al.  Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[24]  SingerYoram,et al.  Context-sensitive learning methods for text categorization , 1999 .

[25]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[26]  Hinrich Schütze,et al.  A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[27]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[28]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[29]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[30]  T. Fearn,et al.  Classification and Regression Trees (CART) , 2020, Statistical Learning from a Regression Perspective.

[31]  D. Collett,et al.  Modelling Binary Data , 1991 .

[32]  Yiming Yang,et al.  Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.

[33]  Blake LeBaron,et al.  A Bootstrap Evaluation of the Effect of Data Splitting on Financial Time Series , 1996, IEEE Trans. Neural Networks.

[34]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[35]  Yiming Yang,et al.  An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[36]  David L. Waltz,et al.  Classifying news stories using memory based reasoning , 1992, SIGIR '92.

[37]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[38]  Nils J. Nilsson,et al.  Artificial Intelligence: A New Synthesis , 1997 .

[39]  Rose,et al.  Statistical mechanics and phase transitions in clustering. , 1990, Physical review letters.

[40]  Yves Chauvin,et al.  Backpropagation: the basic theory , 1995 .

[41]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.