PageRank, HITS and a unified framework for link analysis

Two popular link-based webpage ranking algorithms are (i) PageRank[1] and (ii) HITS (Hypertext Induced Topic Selection)[3]. HITS makes the crucial distinction of hubs and authorities and computes them in a mutually reinforcing way. PageRank considers the hyperlink weight normalization and the equilibrium distribution of random surfers as the citation score. We generalize and combine these key concepts into a unified framework, in which we prove that rankings produced by PageRank and HITS are both highly correlated with the ranking by in-degree and out-degree.

[1]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[2]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[3]  Eli Upfal,et al.  Stochastic models for the Web graph , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[4]  Allan Borodin,et al.  Finding authorities and hubs from link structures on the World Wide Web , 2001, WWW '01.

[5]  Fan Chung Graham,et al.  A random graph model for massive graphs , 2000, STOC '00.

[6]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[7]  David Cohn,et al.  Learning to Probabilistically Identify Authoritative Documents , 2000, ICML.

[8]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[9]  Michael I. Jordan,et al.  Stable algorithms for link analysis , 2001, SIGIR '01.

[10]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[11]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[12]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[13]  Shlomo Moran,et al.  SALSA: the stochastic approach for link-structure analysis , 2001, TOIS.

[14]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[15]  Amos Fiat,et al.  Web search via hub synthesis , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[16]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[17]  Chris H. Q. Ding,et al.  Automatic topic identification using webpage clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[18]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[19]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[20]  Hongyuan Zha,et al.  Analysis of hubs and authorities on the web , 2001 .