The use of hierarchic clustering in information retrieval

Abstract We introduce information retrieval strategies which are based on automatic hierarchic clustering of documents. We discuss the evaluation of retrieval strategies and show, using a subset of the Cranfield Aeronautics document collection, that cluster-based retrieval strategies can be devised which are as effective as linear associative retrieval strategies and much more efficient. Finally, we outline how cluster-based retrieval may be extended to large growing document collections and indicate some ways in which the effectiveness of cluster-based retrieval strategies may be improved.

[1]  K. Sparck Jones,et al.  KEYWORDS AND CLUMPS , 1964 .

[2]  Barry Litofsky,et al.  Utility of automatic classification systems for information storage and retrieval , 1969 .

[3]  H. Edmund Stiles,et al.  The Association Factor in Information Retrieval , 1961, JACM.

[4]  Harold Borko,et al.  Automatic Document Classification , 1963, JACM.

[5]  P. F. Windley,et al.  Trees, Forests and Rearranging , 1960, Comput. J..

[6]  John A. Swets,et al.  Effectiveness of information retrieval methods , 1969 .

[7]  Jack Minker,et al.  An Analysis of Some Graph Theoretical Cluster Techniques , 1970, JACM.

[8]  Karen Spärck Jones,et al.  The use of automatically-obtained keyword classifications for information retrieval , 1969, Inf. Storage Retr..

[9]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[10]  F. W. Lancaster,et al.  Information retrieval systems; characteristics, testing, and evaluation , 1968 .

[11]  Jack Minker,et al.  Deriving term relations for a corpus by graph theoretical clusters , 1970 .

[12]  Robin Sibson,et al.  The Construction of Hierarchic and Non-Hierarchic Classifications , 1968, Comput. J..

[13]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[14]  Raymond E. Bonner,et al.  On Some Clustering Techniques , 1964, IBM J. Res. Dev..

[15]  C. W. Cleverdon Evaluation Tests of Information Retrieval Systems , 1970 .

[16]  Stephen E. Robertson,et al.  THE PARAMETRIC DESCRIPTION OF RETRIEVAL TESTS: PART I: THE BASIC PARAMETERS , 1969 .

[17]  R. Sokal,et al.  Principles of numerical taxonomy , 1965 .

[18]  P. K. T. Vaswani,et al.  The National Physical Laboratory Experiments in Statistical Word Associations and Their Use in Document Indexing And Retrieval. , 1970 .

[19]  J. Rubin Optimal classification into groups: an approach for solving the taxonomy problem. , 1967, Journal of theoretical biology.

[20]  Calvin C. Gotlieb,et al.  Semantic Clustering of Index Terms , 1968, J. ACM.