On component interactions in two-stage recommender systems

Thanks to their scalability, two-stage recommenders are used by many of today’s largest online platforms, including YouTube, LinkedIn, and Pinterest. These systems produce recommendations in two steps: (i) multiple nominators—tuned for low prediction latency—preselect a small subset of candidates from the whole item pool; (ii) a slower but more accurate ranker further narrows down the nominated items, and serves to the user. Despite their popularity, the literature on two-stage recommenders is relatively scarce, and the algorithms are often treated as mere sums of their parts. Such treatment presupposes that the two-stage performance is explained by the behavior of the individual components in isolation. This is not the case: using synthetic and real-world data, we demonstrate that interactions between the ranker and the nominators substantially affect the overall performance. Motivated by these findings, we derive a generalization lower bound which shows that independent nominator training can lead to performance on par with uniformly random recommendations. We find that careful design of item pools, each assigned to a different nominator, alleviates these issues. As manual search for a good pool allocation is difficult, we propose to learn one instead using a Mixture-of-Experts based approach. This significantly improves both precision and recall at K.

[1]  Michael I. Jordan,et al.  Learning from eXtreme Bandit Feedback , 2021, AAAI.

[2]  Ed H. Chi,et al.  Off-policy Learning in Two-stage Recommender Systems , 2020, WWW.

[3]  David M. Blei,et al.  Scalable Recommendation with Hierarchical Poisson Factorization , 2015, UAI.

[4]  Thorsten Joachims,et al.  Unbiased Learning-to-Rank with Biased Feedback , 2016, WSDM.

[5]  Sampath Kannan,et al.  A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem , 2018, NeurIPS.

[6]  Maurizio Morisio,et al.  Hybrid recommender systems: A systematic literature review , 2019, Intell. Data Anal..

[7]  John Riedl,et al.  Recommender Systems for Large-scale E-Commerce : Scalable Neighborhood Formation Using Clustering , 2002 .

[8]  K. Jarrod Millman,et al.  Array programming with NumPy , 2020, Nat..

[9]  S. Levine,et al.  Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[10]  Julian Zimmert,et al.  Adapting to Misspecification in Contextual Bandits , 2021, NeurIPS.

[11]  Li Wei,et al.  Sampling-bias-corrected neural modeling for large corpus item recommendations , 2019, RecSys.

[12]  David Simchi-Levi,et al.  Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability , 2020, SSRN Electronic Journal.

[13]  John N. Tsitsiklis,et al.  Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[14]  Michael I. Jordan,et al.  Exploration in two-stage recommender systems , 2020, ArXiv.

[15]  Xiangnan He,et al.  Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention , 2017, SIGIR.

[16]  Lihong Li,et al.  Learning from Logged Implicit Exploration Data , 2010, NIPS.

[17]  Khashayar Khosravi,et al.  Mostly Exploration-Free Algorithms for Contextual Bandits , 2017, Manag. Sci..

[18]  Thorsten Joachims,et al.  Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..

[19]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[20]  Alexander Rakhlin,et al.  Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles , 2020, ICML.

[21]  John Langford,et al.  Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.

[22]  Yehuda Koren,et al.  On the Difficulty of Evaluating Baselines: A Study on Recommender Systems , 2019, ArXiv.

[23]  Csaba Szepesvari,et al.  Bandit Algorithms , 2020 .

[24]  Jimmy J. Lin,et al.  Fast candidate generation for two-phase document ranking: postings list intersection with bloom filters , 2012, CIKM.

[25]  Jöran Beel,et al.  A Comparison of Offline Evaluations, Online Evaluations, and User Studies in the Context of Research-Paper Recommender Systems , 2015, TPDL.

[26]  Julian McAuley,et al.  Candidate Generation with Binary Codes for Large-Scale Top-N Recommendation , 2019, CIKM.

[27]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[28]  Claudio Gentile,et al.  On multilabel classification and ranking with bandit feedback , 2014, J. Mach. Learn. Res..

[29]  Parul Parashar,et al.  Neural Networks in Machine Learning , 2014 .

[30]  Bo Zhao,et al.  CaSMoS: A Framework for Learning Candidate Selection Models over Structured Queries and Documents , 2016, KDD.

[31]  Paul Covington,et al.  Deep Neural Networks for YouTube Recommendations , 2016, RecSys.

[32]  Philip M. Long,et al.  Associative Reinforcement Learning using Linear Probabilistic Concepts , 1999, ICML.

[33]  Thorsten Joachims,et al.  Estimating Position Bias without Intrusive Interventions , 2018, WSDM.

[34]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[35]  Domonkos Tikk,et al.  Scalable Collaborative Filtering Approaches for Large Recommender Systems , 2009, J. Mach. Learn. Res..

[36]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[37]  Michael L. Waskom,et al.  Seaborn: Statistical Data Visualization , 2021, J. Open Source Softw..

[38]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[39]  Susan Athey,et al.  Tractable contextual bandits beyond realizability , 2020, AISTATS.

[40]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[41]  John Langford,et al.  Off-policy evaluation for slate recommendation , 2016, NIPS.

[42]  Benjamin Recht,et al.  Recommendations and user agency: the reachability of collaboratively-filtered information , 2020, FAT*.

[43]  Ed H. Chi,et al.  Top-K Off-Policy Correction for a REINFORCE Recommender System , 2018, WSDM.

[44]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[45]  Andreas Krause,et al.  Learning to Interact With Learning Agents , 2018, AAAI.

[46]  Craig Boutilier,et al.  SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets , 2019, IJCAI.

[47]  Mariarosaria Taddeo,et al.  Recommender systems and their ethical challenges , 2020, AI & SOCIETY.

[48]  Thorsten Joachims,et al.  Recommendations as Treatments: Debiasing Learning and Evaluation , 2016, ICML.

[49]  Marc G. Bellemare,et al.  Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[50]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[51]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[52]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[53]  Ruosong Wang,et al.  Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2020, ICLR.

[54]  Haipeng Luo,et al.  Practical Contextual Bandits with Regression Oracles , 2018, ICML.

[55]  Tor Lattimore,et al.  Learning with Good Feature Representations in Bandits and in RL with a Generative Model , 2020, ICML.

[56]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[57]  Joseph N. Wilson,et al.  Twenty Years of Mixture of Experts , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[58]  Derek Bridge,et al.  Diversity, Serendipity, Novelty, and Coverage , 2016, ACM Trans. Interact. Intell. Syst..

[59]  Philip S. Thomas,et al.  Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.

[60]  Jure Leskovec,et al.  Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.

[61]  Kun Gai,et al.  Learning Tree-based Deep Model for Recommender Systems , 2018, KDD.

[62]  Michael I. Jordan,et al.  Hierarchies of Adaptive Experts , 1991, NIPS.

[63]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[64]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[65]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[66]  L. Breiman Arcing Classifiers , 1998 .

[67]  Koby Crammer,et al.  Multiclass classification with bandit feedback using adaptive regularization , 2012, Machine Learning.

[68]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[69]  J. Friedman Stochastic gradient boosting , 2002 .

[70]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[71]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[72]  John Langford,et al.  A Contextual Bandit Bake-off , 2018, J. Mach. Learn. Res..

[73]  Sreeram Kannan,et al.  Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms , 2018, ICML.

[74]  Haipeng Luo,et al.  Corralling a Band of Bandit Algorithms , 2016, COLT.

[75]  Jure Leskovec,et al.  Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time , 2017, WWW.

[76]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[77]  Li Wei,et al.  Recommending what video to watch next: a multitask ranking system , 2019, RecSys.

[78]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[79]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[80]  Noam Shazeer,et al.  Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, ArXiv.

[81]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[82]  Fabio Stella,et al.  Contrasting Offline and Online Results when Evaluating Recommendation Algorithms , 2016, RecSys.

[83]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[84]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[85]  Aditya Gopalan,et al.  Misspecified Linear Bandits , 2017, AAAI.

[86]  Michael I. Jordan,et al.  Do Offline Metrics Predict Online Performance in Recommender Systems? , 2020, ArXiv.

[87]  Will Dabney,et al.  Adaptive Trade-Offs in Off-Policy Learning , 2020, AISTATS.

[88]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[89]  Moritz Hardt,et al.  From Optimizing Engagement to Measuring Value , 2020, FAccT.

[90]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[91]  Wei Li,et al.  Multi-Interest Network with Dynamic Routing for Recommendation at Tmall , 2019, CIKM.

[92]  Xu Sun,et al.  Coarse-grained Candidate Generation and Fine-grained Re-ranking for Chinese Abbreviation Prediction , 2014, EMNLP.

[93]  John Langford,et al.  Efficient Optimal Learning for Contextual Bandits , 2011, UAI.

[94]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[95]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.