Leveraging user libraries to bootstrap collaborative filtering

We introduce a novel graphical model, the collaborative score topic model (CSTM), for personal recommendations of textual documents. CSTM's chief novelty lies in its learned model of individual libraries, or sets of documents, associated with each user. Overall, CSTM is a joint directed probabilistic model of user-item scores (ratings), and the textual side information in the user libraries and the items. Creating a generative description of scores and the text allows CSTM to perform well in a wide variety of data regimes, smoothly combining the side information with observed ratings as the number of ratings available for a given user ranges from none to many. Experiments on real-world datasets demonstrate CSTM's performance. We further demonstrate its utility in an application for personal recommendations of posters which we deployed at the NIPS 2013 conference.

[1]  Arindam Banerjee,et al.  Generalized Probabilistic Matrix Factorizations for Collaborative Filtering , 2010, 2010 IEEE International Conference on Data Mining.

[2]  Geng Tian,et al.  Recommending scientific articles using bi-relational graph-based iterative RWR , 2013, RecSys.

[3]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[4]  Geoffrey E. Hinton,et al.  Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[5]  Andrew McCallum,et al.  Expertise modeling for matching papers with reviewers , 2007, KDD '07.

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Chong Wang,et al.  Collaborative topic modeling for recommending scientific articles , 2011, KDD.

[8]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[9]  Yehuda Koren,et al.  Lessons from the Netflix prize challenge , 2007, SKDD.

[10]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[11]  Pietro Perona,et al.  Unsupervised Organization of Image Collections: Taxonomies and Beyond , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Michael J. Best,et al.  Active set algorithms for isotonic regression; A unifying framework , 1990, Math. Program..

[13]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[14]  Chong Wang,et al.  Variational inference in nonconjugate models , 2012, J. Mach. Learn. Res..

[15]  Deepak Agarwal,et al.  Regression-based latent factor models , 2009, KDD.

[16]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[17]  Deepak Agarwal,et al.  fLDA: matrix factorization through latent dirichlet allocation , 2010, WSDM '10.

[18]  Craig Boutilier,et al.  A Framework for Optimizing Paper Matching , 2011, UAI.

[19]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[20]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.

[21]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.