论文信息 - On component interactions in two-stage recommender systems - 字舞流文

On component interactions in two-stage recommender systems

Thanks to their scalability, two-stage recommenders are used by many of today’s largest online platforms, including YouTube, LinkedIn, and Pinterest. These systems produce recommendations in two steps: (i) multiple nominators—tuned for low prediction latency—preselect a small subset of candidates from the whole item pool; (ii) a slower but more accurate ranker further narrows down the nominated items, and serves to the user. Despite their popularity, the literature on two-stage recommenders is relatively scarce, and the algorithms are often treated as mere sums of their parts. Such treatment presupposes that the two-stage performance is explained by the behavior of the individual components in isolation. This is not the case: using synthetic and real-world data, we demonstrate that interactions between the ranker and the nominators substantially affect the overall performance. Motivated by these findings, we derive a generalization lower bound which shows that independent nominator training can lead to performance on par with uniformly random recommendations. We find that careful design of item pools, each assigned to a different nominator, alleviates these issues. As manual search for a good pool allocation is difficult, we propose to learn one instead using a Mixture-of-Experts based approach. This significantly improves both precision and recall at K.

Michael I. Jordan | Niki Kilbertus | Karl Krauth | Jiri Hron | Niki Kilbertus | K. Krauth | Jiri Hron

[1] Michael I. Jordan,et al. Learning from eXtreme Bandit Feedback , 2021, AAAI.

[2] Ed H. Chi,et al. Off-policy Learning in Two-stage Recommender Systems , 2020, WWW.

[3] David M. Blei,et al. Scalable Recommendation with Hierarchical Poisson Factorization , 2015, UAI.

[4] Thorsten Joachims,et al. Unbiased Learning-to-Rank with Biased Feedback , 2016, WSDM.

[5] Sampath Kannan,et al. A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem , 2018, NeurIPS.

[6] Maurizio Morisio,et al. Hybrid recommender systems: A systematic literature review , 2019, Intell. Data Anal..

[7] John Riedl,et al. Recommender Systems for Large-scale E-Commerce : Scalable Neighborhood Formation Using Clustering , 2002 .

[8] K. Jarrod Millman,et al. Array programming with NumPy , 2020, Nat..

[9] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[10] Julian Zimmert,et al. Adapting to Misspecification in Contextual Bandits , 2021, NeurIPS.

[11] Li Wei,et al. Sampling-bias-corrected neural modeling for large corpus item recommendations , 2019, RecSys.

[12] David Simchi-Levi,et al. Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability , 2020, SSRN Electronic Journal.

[13] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[14] Michael I. Jordan,et al. Exploration in two-stage recommender systems , 2020, ArXiv.

[15] Xiangnan He,et al. Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention , 2017, SIGIR.

[16] Lihong Li,et al. Learning from Logged Implicit Exploration Data , 2010, NIPS.

[17] Khashayar Khosravi,et al. Mostly Exploration-Free Algorithms for Contextual Bandits , 2017, Manag. Sci..

[18] Thorsten Joachims,et al. Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..

[19] John D. Hunter,et al. Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[20] Alexander Rakhlin,et al. Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles , 2020, ICML.

[21] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.

[22] Yehuda Koren,et al. On the Difficulty of Evaluating Baselines: A Study on Recommender Systems , 2019, ArXiv.

[23] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .

[24] Jimmy J. Lin,et al. Fast candidate generation for two-phase document ranking: postings list intersection with bloom filters , 2012, CIKM.

[25] Jöran Beel,et al. A Comparison of Offline Evaluations, Online Evaluations, and User Studies in the Context of Research-Paper Recommender Systems , 2015, TPDL.

[26] Julian McAuley,et al. Candidate Generation with Binary Codes for Large-Scale Top-N Recommendation , 2019, CIKM.

[27] Wes McKinney,et al. Data Structures for Statistical Computing in Python , 2010, SciPy.

[28] Claudio Gentile,et al. On multilabel classification and ranking with bandit feedback , 2014, J. Mach. Learn. Res..

[29] Parul Parashar,et al. Neural Networks in Machine Learning , 2014 .

[30] Bo Zhao,et al. CaSMoS: A Framework for Learning Candidate Selection Models over Structured Queries and Documents , 2016, KDD.

[31] Paul Covington,et al. Deep Neural Networks for YouTube Recommendations , 2016, RecSys.

[32] Philip M. Long,et al. Associative Reinforcement Learning using Linear Probabilistic Concepts , 1999, ICML.

[33] Thorsten Joachims,et al. Estimating Position Bias without Intrusive Interventions , 2018, WSDM.

[34] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[35] Domonkos Tikk,et al. Scalable Collaborative Filtering Approaches for Large Recommender Systems , 2009, J. Mach. Learn. Res..

[36] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[37] Michael L. Waskom,et al. Seaborn: Statistical Data Visualization , 2021, J. Open Source Softw..

[38] et al.,et al. Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[39] Susan Athey,et al. Tractable contextual bandits beyond realizability , 2020, AISTATS.

[40] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..

[41] John Langford,et al. Off-policy evaluation for slate recommendation , 2016, NIPS.

[42] Benjamin Recht,et al. Recommendations and user agency: the reachability of collaboratively-filtered information , 2020, FAT*.

[43] Ed H. Chi,et al. Top-K Off-Policy Correction for a REINFORCE Recommender System , 2018, WSDM.

[44] Carl E. Rasmussen,et al. Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[45] Andreas Krause,et al. Learning to Interact With Learning Agents , 2018, AAAI.

[46] Craig Boutilier,et al. SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets , 2019, IJCAI.

[47] Mariarosaria Taddeo,et al. Recommender systems and their ethical challenges , 2020, AI & SOCIETY.

[48] Thorsten Joachims,et al. Recommendations as Treatments: Debiasing Learning and Evaluation , 2016, ICML.

[49] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[50] Ruslan Salakhutdinov,et al. Probabilistic Matrix Factorization , 2007, NIPS.

[51] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[52] Robert A. Jacobs,et al. Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[53] Ruosong Wang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2020, ICLR.

[54] Haipeng Luo,et al. Practical Contextual Bandits with Regression Oracles , 2018, ICML.

[55] Tor Lattimore,et al. Learning with Good Feature Representations in Bandits and in RL with a Generative Model , 2020, ICML.

[56] Jonathan L. Herlocker,et al. Evaluating collaborative filtering recommender systems , 2004, TOIS.

[57] Joseph N. Wilson,et al. Twenty Years of Mixture of Experts , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[58] Derek Bridge,et al. Diversity, Serendipity, Novelty, and Coverage , 2016, ACM Trans. Interact. Intell. Syst..

[59] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.

[60] Jure Leskovec,et al. Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.

[61] Kun Gai,et al. Learning Tree-based Deep Model for Recommender Systems , 2018, KDD.

[62] Michael I. Jordan,et al. Hierarchies of Adaptive Experts , 1991, NIPS.

[63] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[64] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[65] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[66] L. Breiman. Arcing Classifiers , 1998 .

[67] Koby Crammer,et al. Multiclass classification with bandit feedback using adaptive regularization , 2012, Machine Learning.

[68] J. Friedman. Greedy function approximation: A gradient boosting machine. , 2001 .

[69] J. Friedman. Stochastic gradient boosting , 2002 .

[70] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[71] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[72] John Langford,et al. A Contextual Bandit Bake-off , 2018, J. Mach. Learn. Res..

[73] Sreeram Kannan,et al. Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms , 2018, ICML.

[74] Haipeng Luo,et al. Corralling a Band of Bandit Algorithms , 2016, COLT.

[75] Jure Leskovec,et al. Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time , 2017, WWW.

[76] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[77] Li Wei,et al. Recommending what video to watch next: a multitask ranking system , 2019, RecSys.

[78] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[79] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[80] Noam Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, ArXiv.

[81] Steffen Rendle,et al. Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[82] Fabio Stella,et al. Contrasting Offline and Online Results when Evaluating Recommendation Algorithms , 2016, RecSys.

[83] Tat-Seng Chua,et al. Neural Collaborative Filtering , 2017, WWW.

[84] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[85] Aditya Gopalan,et al. Misspecified Linear Bandits , 2017, AAAI.

[86] Michael I. Jordan,et al. Do Offline Metrics Predict Online Performance in Recommender Systems? , 2020, ArXiv.

[87] Will Dabney,et al. Adaptive Trade-Offs in Off-Policy Learning , 2020, AISTATS.

[88] Yehuda Koren,et al. Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[89] Moritz Hardt,et al. From Optimizing Engagement to Measuring Value , 2020, FAccT.

[90] Joel Nothman,et al. SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[91] Wei Li,et al. Multi-Interest Network with Dynamic Routing for Recommendation at Tmall , 2019, CIKM.

[92] Xu Sun,et al. Coarse-grained Candidate Generation and Fine-grained Re-ranking for Chinese Abbreviation Prediction , 2014, EMNLP.

[93] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.

[94] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[95] F. Maxwell Harper,et al. The MovieLens Datasets: History and Context , 2016, TIIS.