WWW 2009 MADRID! Track: Data Mining / Session: Graph Algorithms Fast Dynamic Reranking in Large Graphs

In this paper we consider the problem of re-ranking search results by incorporating user feedback. We present a graph theoretic measure for discriminating irrelevant results from relevant results using a few labeled examples provided by the user. The key intuition is that nodes relatively closer (in graph topology) to the relevant nodes than the irrelevant nodes are more likely to be relevant. We present a simple sampling algorithm to evaluate this measure at specific nodes of interest, and an efficient branch and bound algorithm to compute the top k nodes from the entire graph under this measure. On quantifiable prediction tasks the introduced measure outperforms other diffusion-based proximity measures which take only the positive relevance feedback into account. On the Entity-Relation graph built from the authors and papers of the entire DBLP citation corpus (1.4 million nodes and 2.2 million edges) our branch and bound algorithm takes about 1.5 seconds to retrieve the top 10 nodes w.r.t. this measure with 10 labeled nodes.

[1]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[2]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[3]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[4]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[5]  Soumen Chakrabarti,et al.  Dynamic personalized pagerank in entity-relation graphs , 2007, WWW '07.

[6]  Yi Zhang,et al.  Incorporating Diversity and Density in Active Learning for Relevance Feedback , 2007, ECIR.

[7]  Purnamrita Sarkar,et al.  A Tractable Approach to Finding Closest Truncated-commute-time Neighbors in Large Graphs , 2007, UAI.

[8]  Leo Grady,et al.  Random Walks for Image Segmentation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[10]  D. Aldous,et al.  Chapter 3 Reversible Markov Chains , 1994 .

[11]  Shang-Hua Teng,et al.  Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems , 2003, STOC '04.

[12]  Hang Li,et al.  Ranking refinement and its application to information retrieval , 2008, WWW.

[13]  Gary L. Miller,et al.  A linear work, O(n1/6) time, parallel algorithm for solving planar Laplacians , 2007, SODA '07.

[14]  Zoubin Ghahramani,et al.  Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions , 2003, ICML.

[15]  Ravi Kumar,et al.  Anchor-based proximity measures , 2007, WWW '07.

[16]  Carlos Castillo,et al.  Web spam identification through content and hyperlinks , 2008, AIRWeb '08.

[17]  Purnamrita Sarkar,et al.  Fast incremental proximity search in large graphs , 2008, ICML '08.

[18]  Marc Najork,et al.  Detecting spam web pages through content analysis , 2006, WWW '06.

[19]  Dani Lischinski,et al.  Colorization using optimization , 2004, ACM Trans. Graph..