An Efficient Boosting Algorithm for Combining Preferences by

We study the problem of learning to accurately rank a set of objects by combining a given collection of ranking or preference functions. This problem of combining preferences arises in several applications, such as that of combining the results of different search engines, or the "collaborative-filtering" problem of ranking movies for a user based on the movie rankings provided by other users. In this work, we begin by presenting a formal framework for this general problem. We then describe and analyze an efficient algorithm called RankBoost for combining preferences based on the boosting approach to machine learning. We give theoretical results describing the algorithm's behavior both on the training data, and on new test data not seen during training. We also describe an efficient implementation of the algorithm for a particular restricted but common case. We next discuss two experiments we carried out to assess the performance of RankBoost. In the first experiment, we used the algorithm to combine different web search strategies, each of which is a query expansion for a given domain. The second experiment is a collaborative-filtering task for making movie recommendations.

[1]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[2]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[3]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[4]  Louis Guttman,et al.  What Is Not What in Statistics , 1977 .

[5]  David Haussler,et al.  Equivalence of models for polynomial learnability , 1988, COLT '88.

[6]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[7]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[8]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[9]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[10]  Naoki Abe,et al.  Polynomial learnability of probabilistic concepts with respect to the Kullback-Leibler divergence , 1991, COLT '91.

[11]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[12]  Robert E. Schapire,et al.  Design and analysis of efficient learning algorithms , 1992, ACM Doctoral dissertation award ; 1991.

[13]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[14]  Harris Drucker,et al.  Boosting Performance in Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[15]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[16]  Yoav Freund,et al.  Data filtering and distribution modeling algorithms for machine learning , 1993 .

[17]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[18]  Garrison W. Cottrell,et al.  Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.

[19]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[20]  Paul B. Kantor,et al.  Decision Level Data Fusion for Routing of Documents in the TREC3 Context: A Base Case Analysis of Worst Case Results , 1994, TREC.

[21]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[22]  Harris Drucker,et al.  Boosting and Other Ensemble Methods , 1994, Neural Computation.

[23]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[24]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[25]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[26]  Mark Craven,et al.  Learning Sparse Perceptrons , 1995, NIPS.

[27]  Tom M. Mitchell,et al.  Using the Future to Sort Out the Present: Rankprop and Multitask Learning for Medical Risk Evaluation , 1995, NIPS.

[28]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[29]  Pattie Maes,et al.  Social information filtering: algorithms for automating “word of mouth” , 1995, CHI '95.

[30]  Corinna Cortes,et al.  Boosting Decision Trees , 1995, NIPS.

[31]  Mark Rosenstein,et al.  Recommending and evaluating choices in a virtual community of use , 1995, CHI '95.

[32]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[33]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[34]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[35]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[36]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[37]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[38]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[39]  Robert Tibshirani,et al.  Bias, Variance and Prediction Error for Classification Rules , 1996 .

[40]  Oren Etzioni,et al.  Efficient information gathering on the Internet , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[41]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[42]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[43]  Yoshua Bengio,et al.  Training Methods for Adaptive Boosting of Neural Networks , 1997, NIPS.

[44]  David W. Opitz,et al.  An Empirical Evaluation of Bagging and Boosting , 1997, AAAI/IAAI.

[45]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[46]  L. Breiman Arcing the edge , 1997 .

[47]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[48]  Robert E. Schapire,et al.  Using output codes to boost multiclass learning problems , 1997, ICML.

[49]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[50]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[51]  Peter L. Bartlett,et al.  Direct Optimization of Margins Improves Generalization in Combined Classifiers , 1998, NIPS.

[52]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[53]  L. Breiman Arcing Classifiers , 1998 .

[54]  Yoav Freund,et al.  Discussion of the paper "Arcing Classifiers" by Leo Breiman , 1998 .

[55]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[56]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[57]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[58]  Robert E. Schapire,et al.  Theoretical Views of Boosting , 1999, EuroCOLT.

[59]  Venkatesan Guruswami,et al.  Multiclass learning, boosting, and error-correcting codes , 1999, COLT '99.

[60]  Yoav Freund,et al.  An Adaptive Version of the Boost by Majority Algorithm , 1999, COLT '99.

[61]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[62]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[63]  Yoram Singer,et al.  Boosting for document routing , 2000, CIKM '00.