Learning bilinear model for matching queries and documents

The task of matching data from two heterogeneous domains naturally arises in various areas such as web search, collaborative filtering, and drug design. In web search, existing work has designed relevance models to match queries and documents by exploiting either user clicks or content of queries and documents. To the best of our knowledge, however, there has been little work on principled approaches to leveraging both clicks and content to learn a matching model for search. In this paper, we propose a framework for learning to match heterogeneous objects. The framework learns two linear mappings for two objects respectively, and matches them via the dot product of their images after mapping. Moreover, when different regularizations are enforced, the framework renders a rich family of matching models. With orthonormal constraints on mapping functions, the framework subsumes Partial Least Squares (PLS) as a special case. Alternatively, with a l1+l2 regularization, we obtain a new model called Regularized Mapping to Latent Structures (RMLS). RMLS enjoys many advantages over PLS, including lower time complexity and easy parallelization. To further understand the matching framework, we conduct generalization analysis and apply the result to both PLS and RMLS. We apply the framework to web search and implement both PLS and RMLS using a click-through bipartite with metadata representing features of queries and documents. We test the efficacy and scalability of RMLS and PLS on large scale web search problems. The results show that both PLS and RMLS can significantly outperform baseline methods, while RMLS substantially speeds up the learning process.

[1]  Ricardo A. Baeza-Yates,et al.  Extracting semantic relations from query logs , 2007, KDD '07.

[2]  Hang Li Learning to Rank for Information Retrieval and Natural Language Processing , 2011, Synthesis Lectures on Human Language Technologies.

[3]  Yong Yu,et al.  SVDFeature: a toolkit for feature-based collaborative filtering , 2012, J. Mach. Learn. Res..

[4]  Jacob A. Wegelin,et al.  A Survey of Partial Least Squares (PLS) Methods, with Emphasis on the Two-Block Case , 2000 .

[5]  Wei Wu,et al.  Learning a Robust Relevance Model for Search Using Kernel Methods , 2011, J. Mach. Learn. Res..

[6]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[7]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[8]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[9]  Tomer Hertz,et al.  Boosting margin based distance functions for clustering , 2004, ICML.

[10]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[11]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[12]  Yanjun Qi,et al.  Supervised semantic indexing , 2009, ECIR.

[13]  Kaizhu Huang,et al.  Sparse Metric Learning via Smooth Optimization , 2009, NIPS.

[14]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[15]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[16]  Quoc V. Le,et al.  Abstract , 2003, Appetite.

[17]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[18]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[19]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[20]  Cynthia Rudin,et al.  Margin-Based Ranking Meets Boosting in the Middle , 2005, COLT.

[21]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[22]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[23]  Francis R. Bach,et al.  A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization , 2008, J. Mach. Learn. Res..

[24]  Stephen E. Robertson,et al.  Okapi at TREC , 1992, TREC.

[25]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[26]  Michael R. Lyu,et al.  Learning latent semantic relations from clickthrough data for query suggestion , 2008, CIKM '08.

[27]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[28]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[29]  Jianfeng Gao,et al.  Clickthrough-based latent semantic models for web search , 2011, SIGIR.

[30]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[31]  Yi Liu,et al.  An Efficient Algorithm for Local Distance Metric Learning , 2006, AAAI.

[32]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[33]  Tie-Yan Liu,et al.  Two-Layer Generalization Analysis for Ranking Using Rademacher Average , 2010, NIPS.

[34]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[35]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[36]  Roman Rosipal,et al.  Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.

[37]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[38]  Samy Bengio,et al.  A Discriminative Kernel-Based Approach to Rank Images from Text Queries , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Francis R. Bach,et al.  Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning , 2008, NIPS.

[40]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[41]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[42]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[43]  Shivani Agarwal,et al.  Stability and Generalization of Bipartite Ranking Algorithms , 2005, COLT.

[44]  Corinna Cortes,et al.  Invited talk: Can learning kernels help performance? , 2009, International Conference on Machine Learning.

[45]  Peter J. Schreier,et al.  A Unifying Discussion of Correlation Analysis for Complex Random Vectors , 2008, IEEE Transactions on Signal Processing.

[46]  Thomas Hofmann,et al.  Latent semantic models for collaborative filtering , 2004, TOIS.

[47]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[48]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[49]  Masashi Sugiyama,et al.  Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis , 2007, J. Mach. Learn. Res..

[50]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[51]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[52]  Wei Liu,et al.  Learning Distance Metrics with Contextual Constraints for Image Retrieval , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[53]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[54]  David Grangier,et al.  A Discriminative Kernel-based Model to Rank Images from Text Queries , 2007 .

[55]  Hang Li,et al.  Relevance Ranking Using Kernels , 2010, AIRS.