Abstract
We study the problem of learning to accurately rank a set of objects by combining a given collection of ranking or preference functions. This problem of combining preferences arises in several applications, such as that of combining the results of different search engines, or the "collaborative-filtering" problem of ranking movies for a user based on the movie rankings provided by other users. In this work, we begin by presenting a formal framework for this general problem. We then describe and analyze an efficient algorithm called RankBoost for combining preferences based on the boosting approach to machine learning. We give theoretical results describing the algorithm's behavior both on the training data, and on new test data not seen during training. We also describe an efficient implementation of the algorithm for a particular restricted but common case. We next discuss two experiments we carried out to assess the performance of RankBoost. In the first experiment, we used the algorithm to combine different web search strategies, each of which is a query expansion for a given domain. The second experiment is a collaborative-filtering task for making movie recommendations.
- Brian T. Bartell, Garrison W. Cottrell, and Richard K. Belew. Automatic combination of multiple ranked retrieval systems. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1994. Google Scholar
- Peter L. Bartlett. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44(2): 525-536, March 1998. Google Scholar
- John S. Breese, David Heckerman, and Carl Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pages 43-52, 1998. Google Scholar
- Rich Caruana, Shumeet Baluja, and Tom Mitchell. Using the future to "sort out" the present: Rankprop and multitask learning for medical risk evaluation. In Advances in Neural Information Processing Systems 8, pages 959-965, 1996.Google Scholar
- William W. Cohen, Robert E. Schapire, and Yoram Singer. Learning to order things. Journal of Artificial Intelligence Research, 10:243-270, 1999. Google Scholar
- Michael Collins. Discriminative reranking for natural language parsing. In Proceedings of the Seventeenth International Conference on Machine Learning, 2000. Google Scholar
- K. Crammer and Y. Singer. A new family of online algorithms for category ranking. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2002. Google Scholar
- Koby Crammer and Yoram Singer. Pranking with ranking. In Advances in Neural Information Processing Systems 14, 2001.Google Scholar
- Luc Devroye, Lázló Györfi, and Gábor Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, 1996.Google Scholar
- O. Etzioni, S. Hanks, T. Jiang, R. M. Karp, O. Madani, and O. Waarts. Efficient information gathering on the internet. In 37th Annual Symposium on Foundations of Computer Science, 1996. Google Scholar
- Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1): 119-139, August 1997. Google Scholar
- David Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100(1): 78-150, 1992. Google Scholar
- David Haussler, Michael Kearns, Nick Littlestone, and Manfred K. Warmuth. Equivalence of models for polynomial learnability. Information and Computation, 95(2): 129-161, December 1991. Google Scholar
- Will Hill, Larry Stead, Mark Rosenstein, and George Furnas. Recommending and evaluating choices in a virtual community of use. In Human Factors in Computing Systems CHI'95 Conference Proceedings, pages 194-201, 1995. Google Scholar
- Raj D. Iyer, David D. Lewis, Robert E. Schapire, Yoram Singer, and Amit Singhal. Boosting for document routing. In Proceedings of the Ninth International Conference on Information and Knowledge Management, 2000. Google Scholar
- Guy Lebanon and John Lafferty. Cranking: Combining rankings using conditional probability models on permutations. In Proceedings of the Nineteenth International Conference on Machine Learning, 2002. Google Scholar
- Paul Resnick, Neophytos Iacovou, Mitesh Sushak, Peter Bergstrom, and John Riedl. Grouplens: An open architecture for collaborative filtering of netnews. In Proceedings of Computer Supported Cooperative Work, 1995. Google Scholar
- Gerard Salton. Automatic text processing: the transformation, analysis and retrieval of information by computer. Addison-Wesley, 1989. Google Scholar
- Gerard Salton and Michael J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983. Google Scholar
- Robert E. Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5): 1651-1686, October 1998.Google Scholar
- Robert E. Schapire and Yoram Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3): 297-336, December 1999. Google Scholar
- Upendra Shardanand and Pattie Maes. Social information filtering: Algorithms for automating "word of mouth". In Human Factors in Computing Systems CHI'95 Conference Proceedings, 1995. Google Scholar
- V. N. Vapnik. Estimation of Dependences Based on Empirical Data. Springer-Verlag, 1982. Google Scholar
- Marilyn A. Walker, Owen Rambow, and Monica Rogati. SPoT: A trainable sentence planner. In Proceedings of the 2nd Annual Meeting of the North American Chapter of the Associataion for Computational Linguistics, 2001. Google Scholar
Index Terms
- An efficient boosting algorithm for combining preferences
Recommendations
Evaluating Google queries based on language preferences
This paper evaluates the assumption that users expect search engines to retrieve the same results for queries regardless of the language or the location of the originator. The dependency of the Google search engine on the language and location from ...
A Lucene Optimization Algorithm Combining Word Sequence Features
ICFET '18: Proceedings of the 4th International Conference on Frontiers of Educational TechnologiesAiming at the ElasticSearch search engine used in the research and demonstration platform of local chronicles visualization technology, the content searched only by the keyword in the local chronicles' full text search process is not accurate enough, a ...
Ranking function adaptation with boosting trees
Machine-learned ranking functions have shown successes in Web search engines. With the increasing demands on developing effective ranking functions for different search domains, we have seen a big bottleneck, that is, the problem of insufficient labeled ...
Comments