Link Analysis: Hubs and Authorities on the World Wide Web

Ranking the tens of thousands of retrieved webpages for a user query on a Web search engine such that the most informative webpages are on the top is a key information retrieval technology. A popular ranking algorithm is the HITS algorithm of Kleinberg. It explores the reinforcing interplay between authority and hub webpages on a particular topic by taking into account the structure of the Web graphs formed by the hyperlinks between the webpages. In this paper, we give a detailed analysis of the HITS algorithm through a unique combination of probabilistic analysis and matrix algebra. In particular, we show that to first-order approximation, the ranking given by the HITS algorithm is the same as the ranking by counting inbound and outbound hyperlinks. Using Web graphs of different sizes, we also provide experimental results to illustrate the analysis.

[1]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[2]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[3]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[4]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[5]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[6]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[7]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[8]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[9]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[10]  M. KleinbergJon Authoritative sources in a hyperlinked environment , 1999 .

[11]  Ravi Kumar,et al.  Extracting Large-Scale Knowledge Bases from the Web , 1999, VLDB.

[12]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[13]  David Cohn,et al.  Learning to Probabilistically Identify Authoritative Documents , 2000, ICML.

[14]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[15]  Fan Chung Graham,et al.  A random graph model for massive graphs , 2000, STOC '00.

[16]  Ian Witten,et al.  Data Mining , 2000 .

[17]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[18]  Anna R. Karlin Spectral Analysis for Data Mining , 2001, ALENEX.

[19]  Jon Kleinberg,et al.  The Structure of the Web , 2001, Science.

[20]  Allan Borodin,et al.  Finding authorities and hubs from link structures on the World Wide Web , 2001, WWW '01.

[21]  Chris H. Q. Ding,et al.  Automatic topic identification using webpage clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[22]  Shlomo Moran,et al.  SALSA: the stochastic approach for link-structure analysis , 2001, TOIS.

[23]  Monika Henzinger,et al.  Hyperlink Analysis for the Web , 2001, IEEE Internet Comput..

[24]  Michael I. Jordan,et al.  Stable algorithms for link analysis , 2001, SIGIR '01.

[25]  Amos Fiat,et al.  Web search via hub synthesis , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[26]  Chris H. Q. Ding,et al.  PageRank, HITS and a unified framework for link analysis , 2002, SIGIR '02.