Keyword Combination Extraction in Text Categorization Based on Ant Colony Optimization

Due to the increasing number of documents in digital form, the automated text categorization (TC) has become more and more promising in the last ten years. A TC system can automatically assign a document with the most suitable category, but the reason for such an assignment is usually unknown by users. To make the TC system be interpretable, it is necessary to select a group of keywords, or termed a keyword combination, to describe each text category. In this paper, we propose a novel algorithm, keyword combination extraction based on ant colony optimization (KCEACO), to search the optimal keyword combination of a target category. By extending the traditional feature selection techniques, an evaluation function is designed for evaluating a keyword combination. This function takes into account the relationships among different keywords. Experimental results show that KCEACO can efficiently find the optimal keyword combination from a large number of candidate combinations.

[1]  Maria Simi,et al.  Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization , 2000, ECDL.

[2]  Jun Zhang,et al.  An Ant Colony Optimization Approach to a Grid Workflow Scheduling Problem With Various QoS Requirements , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[3]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[4]  H.S.-H. Chung,et al.  Extended Ant Colony Optimization Algorithm for Power Electronic Circuit Design , 2009, IEEE Transactions on Power Electronics.

[5]  Marco Dorigo,et al.  Optimization, Learning and Natural Algorithms , 1992 .

[6]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[7]  Jun Zhang,et al.  A pseudo parallel ant algorithm with an adaptive migration controller , 2008, Appl. Math. Comput..

[8]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[9]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[10]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[11]  Marco Dorigo,et al.  Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[12]  Holger H. Hoos,et al.  Improving the Ant System: A Detailed Report on the MAX-MIN Ant System , 1996 .

[13]  Han-Joon Kim,et al.  News Keyword Extraction for Topic Tracking , 2008, 2008 Fourth International Conference on Networked Computing and Advanced Information Management.

[14]  Jun Zhang,et al.  Implementation of an Ant Colony Optimization technique for job shop scheduling problem , 2006 .

[15]  T. Gungor,et al.  An evaluation of existing and new feature selection metrics in text categorization , 2008, 2008 23rd International Symposium on Computer and Information Sciences.

[16]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[17]  J. An,et al.  Keyword extraction for text categorization , 2005, Proceedings of the 2005 International Conference on Active Media Technology, 2005. (AMT 2005)..

[18]  Jun Zhang,et al.  Orthogonal Methods Based Ant Colony Search for Solving Continuous Optimization Problems , 2008, Journal of Computer Science and Technology.

[19]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[20]  Fang Yuan,et al.  A Concept Mapping Method Based on Core Words of Class , 2007, 2007 International Conference on Machine Learning and Cybernetics.