Applying Genetic Algorithms to the Feature Selection Problem in Information Retrieval

The demand of accuracy and speed in the Information Retrieval processes has revealed the necessity of a good classification of the large collection of documents existing in databases and Web servers. The representation of documents in the vector space model with terms as features offers the possibility of application of Machine Learning techniques. A filter method to select the most relevant features before the classification process is presented in this paper. A Genetic Algorithm (GA) is used as a powerful tool to search solutions in the domain of relevant features. Implementation and some preliminary experiments have been realized. The application of this technique to the vector space model in Information Retrieval is outlined as future work.