Social Link Prediction in Online Social Tagging Systems

Social networks have become a popular medium for people to communicate and distribute ideas, content, news, and advertisements. Social content annotation has naturally emerged as a method of categorization and filtering of online information. The unrestricted vocabulary users choose from to annotate content has often lead to an explosion of the size of space in which search is performed. In this article, we propose latent topic models as a principled way of reducing the dimensionality of such data and capturing the dynamics of collaborative annotation process. We propose three generative processes to model latent user tastes with respect to resources they annotate with metadata. We show that latent user interests combined with social clues from the immediate neighborhood of users can significantly improve social link prediction in the online music social media site Last.fm. Most link prediction methods suffer from the high class imbalance problem, resulting in low precision and/or recall. In contrast, our proposed classification schemes for social link recommendation achieve high precision and recall with respect to not only the dominant class (nonexistence of a link), but also with respect to sparse positive instances, which are the most vital in social tie prediction.

[1]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[2]  Alejandro Jaimes,et al.  Understanding and leveraging tag-based relations in on-line social networks , 2012, HT '12.

[3]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Jennifer Neville,et al.  Modeling relationship strength in online social networks , 2010, WWW '10.

[5]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[6]  Doina Caragea,et al.  Predicting Friendship Links in Social Networks Using a Topic Modeling Approach , 2011, PAKDD.

[7]  ChelmisCharalampos,et al.  Social Link Prediction in Online Social Tagging Systems , 2013 .

[8]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[9]  Ralf Krestel,et al.  Latent dirichlet allocation for tag recommendation , 2009, RecSys '09.

[10]  Ian Ruthven,et al.  Improving social bookmark search using personalised latent variable language models , 2011, WSDM '11.

[11]  Ben Taskar,et al.  Link Prediction in Relational Data , 2003, NIPS.

[12]  Tsvi Kuflik,et al.  Workshop on information heterogeneity and fusion in recommender systems (HetRec 2010) , 2010, RecSys '10.

[13]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[14]  Nathan Srebro,et al.  SVM optimization: inverse dependence on training set size , 2008, ICML '08.

[15]  Nitesh V. Chawla,et al.  Multi-relational Link Prediction in Heterogeneous Information Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[16]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[17]  Charles Elkan,et al.  Link Prediction via Matrix Factorization , 2011, ECML/PKDD.

[18]  Valentin Robu,et al.  The complex dynamics of collaborative tagging , 2007, WWW '07.

[19]  S. Sathiya Keerthi,et al.  Building Support Vector Machines with Reduced Classifier Complexity , 2006, J. Mach. Learn. Res..

[20]  Francesco Bonchi,et al.  Cold start link prediction , 2010, KDD.

[21]  C. Lee Giles,et al.  Learning on the border: active learning in imbalanced data classification , 2007, CIKM '07.

[22]  Donald B. Johnson,et al.  Efficient Algorithms for Shortest Paths in Sparse Networks , 1977, J. ACM.

[23]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[24]  Krishna P. Gummadi,et al.  You are who you know: inferring user profiles in online social networks , 2010, WSDM '10.

[25]  Robin Burke,et al.  Context-aware music recommendation based on latenttopic sequential patterns , 2012, RecSys.

[26]  Hans-Peter Kriegel,et al.  Hierarchical Bayesian Models for Collaborative Tagging Systems , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[27]  Yan Liu,et al.  Topic-link LDA: joint models of topic and author community , 2009, ICML '09.

[28]  Thomas L. Griffiths,et al.  Learning author-topic models from text corpora , 2010, TOIS.

[29]  Rui Li,et al.  Survey on social tagging techniques , 2010, SKDD.

[30]  San Murugesan,et al.  Handbook of Research on Web 2.0, 3.0, and X.0: Technologies, Business, and Social Applications , 2009 .

[31]  Rossano Schifanella,et al.  Folks in Folksonomies: social link prediction from shared metadata , 2010, WSDM '10.

[32]  Henry A. Kautz,et al.  Finding your friends and following them to where you are , 2012, WSDM '12.

[33]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[34]  Marco Pennacchiotti,et al.  Investigating topic models for social media user recommendation , 2011, WWW.

[35]  Peter D. Hoff,et al.  Multiplicative latent factor models for description and prediction of social networks , 2009, Comput. Math. Organ. Theory.

[36]  Zhoujun Li,et al.  The topic-perspective model for social tagging systems , 2010, KDD.

[37]  Laura Dietz Modeling Shared Tastes in Online Communities , 2009 .

[38]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[39]  Bernardo A. Huberman,et al.  The Structure of Collaborative Tagging Systems , 2005, ArXiv.

[40]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[41]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[42]  Mark S. Granovetter T H E S T R E N G T H O F WEAK TIES: A NETWORK THEORY REVISITED , 1983 .

[43]  Lu Liu,et al.  A probabilistic graphical model for topic and preference discovery on social media , 2012, Neurocomputing.

[44]  Zhiyuan Liu,et al.  PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing , 2011, TIST.

[45]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[46]  Bernardo A. Huberman,et al.  Usage patterns of collaborative tagging systems , 2006, J. Inf. Sci..

[47]  Liang Ge,et al.  Pseudo Cold Start Link Prediction with Multiple Sources in Social Networks , 2012, SDM.

[48]  Bing He,et al.  The dynamic features of Delicious, Flickr, and YouTube , 2011, J. Assoc. Inf. Sci. Technol..

[49]  Masoud Makrehchi Social link recommendation by learning hidden topics , 2011, RecSys '11.

[50]  Philip S. Yu,et al.  Unsupervised learning on k-partite graphs , 2006, KDD '06.