Many web-based application areas must infer label distributions starting from a small set of sparse, noisy labels. Examples include searching for, recommending, and advertising against image, audio, and video content. These labeling problems must handle millions of interconnected entities (users, domains, content segments) and thousands of competing labels (interests, tags, recommendations, topics). Previous work has shown that graph-based propagation can be very effective at finding the best label distribution across nodes, starting from partial information and a weightedconnection graph. In their work on video recommendations, Baluja et al. [1] showed high-quality results using Adsorption, a normalized propagation process. An important step in the original formulation of Adsorption was re-normalization of the label vectors associated with each node, between every propagation step. That interleaved normalization forced computation of all label distributions, in synchrony, in order to allow the normalization to be correctly determined. Interleaved normalization also prevented use of standard linear-algebra methods, like stabilized bi-conjugate gradient descent (BiCGStab) and Gaussian elimination. This paper presents a method that replaces the interleaved normalization with a single pre-normalization, done once before the main propagation process starts, allowing use of selective label computation (label slicing) as well as large-matrix-solution methods. As a result, much larger graphs and label sets can be handled than in the original formulation and more accurate solutions can be found in fewer propagation steps. We also report results from using pre-normalized Adsorption in topic labeling for web domains, using label slicing and BiCGStab. Keywords-graph propagation, large-scale labeling, stabilized bi-conjugate gradient descent, Gaussian elimination, topic discovery, web domains.
[1]
Partha Pratim Talukdar,et al.
Weakly-Supervised Acquisition of Labeled Class Instances using Graph Random Walks
,
2008,
EMNLP.
[2]
Shankar Kumar,et al.
Video suggestion and discovery for youtube: taking random walks through the view graph
,
2008,
WWW.
[3]
Sanjay Ghemawat,et al.
MapReduce: Simplified Data Processing on Large Clusters
,
2004,
OSDI.
[4]
Zoubin Ghahramani,et al.
Learning from labeled and unlabeled data with label propagation
,
2002
.
[5]
Xian-Sheng Hua,et al.
Video search re-ranking via multi-graph propagation
,
2007,
ACM Multimedia.
[6]
Jason Baldridge,et al.
Twitter Polarity Classification with Label Propagation over Lexical Links and the Follower Graph
,
2011,
ULNLP@EMNLP.
[7]
Henk A. van der Vorst,et al.
Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems
,
1992,
SIAM J. Sci. Comput..
[8]
Shumeet Baluja,et al.
VisualRank: Applying PageRank to Large-Scale Image Search
,
2008,
IEEE Transactions on Pattern Analysis and Machine Intelligence.