Improving category specific Web search by learning query modifications

Users looking for documents within specific categories may have a difficult time locating valuable documents using general purpose search engines. We present an automated method for learning query modifications that can dramatically improve precision for locating pages within specified categories using Web search engines. We also present a classification procedure that can recognize pages in a specific category with high precision, using textual content, text location and HTML structure. Evaluation shows that the approach is highly effective for locating personal homepages and calls for papers. These algorithms are used to improve category specific search in the Inquirus 2 search engine.

[1]  C. Lee Giles,et al.  Accessibility of information on the Web , 2000, INTL.

[2]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[3]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[4]  Oren Etzioni,et al.  The MetaCrawler architecture for resource aggregation on the Web , 1997 .

[5]  James T. Kwok Automated Text Categorization Using Support Vector Machine , 1998, ICONIP.

[6]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[7]  William P. Birmingham,et al.  Architecture of a metasearch engine that supports user information needs , 1999, CIKM '99.

[8]  Adele E. Howe,et al.  SAVVYSEARCH: A Metasearch Engine That Learns Which Search Engines to Query , 1997, AI Mag..

[9]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[10]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[11]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[13]  J G Daugman,et al.  Information Theory and Coding , 1998 .

[14]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[15]  Ophir Frieder,et al.  Information retrieval - algorithms and heuristics , 1998, The Kluwer international series in engineering and computer science.

[16]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[17]  Marco Gori,et al.  Focused Crawling Using Context Graphs , 2000, VLDB.

[18]  Kristian J. Hammond,et al.  Watson: Anticipating and Contextualizing Information Needs , 1999 .

[19]  Guijun Wang,et al.  ProFusion*: Intelligent Fusion from Multiple, Distributed Search Engines , 1996, J. Univers. Comput. Sci..

[20]  C. Lee Giles,et al.  Context and Page Analysis for Improved Web Search , 1998, IEEE Internet Comput..