ABSTRACT
Modern web search engines are federated --- a user query is sent to the numerous specialized search engines called verticals like web (text documents), News, Image, Video, etc. and the results returned by these engines are then aggregated and composed into a search result page (SERP) and presented to the user. For a specific query, multiple verticals could be relevant, which makes the placement of these vertical results within blocks of textual web results challenging: how do we represent, assess, and compare the relevance of these heterogeneous entities?
In this paper we present a machine-learning framework for SERP composition in the presence of multiple relevant verticals. First, instead of using the traditional label generation method of human judgment guidelines and trained judges, we use a randomized online auditioning system that allows us to evaluate triples of the form query, web block, vertical>. We use a pairwise click preference to evaluate whether the web block or the vertical block had a better users' engagement. Next, we use a hinged feature vector that contains features from the web block to create a common reference frame and augment it with features representing the specific vertical judged by the user. A gradient boosted decision tree is then learned from the training data. For the final composition of the SERP, we place a vertical result at a slot if the score is higher than a computed threshold. The thresholds are algorithmically determined to guarantee specific coverage for verticals at each slot.
We use correlation of clicks as our offline metric and show that click-preference target has a better correlation than human judgments based models. Furthermore, on online tests for News and Image verticals we show higher user engagement for both head and tail queries.
- J. Arguello, J. Callan, F. Diaz, and J. F. Crespo. Source of evidence for vertical selection. In Proc. of Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2009. Google ScholarDigital Library
- J. Arguello, F. Diaz, and J. F. Paiement. Vertical selection in presence of unlabeled verticals. In Proc. of Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2010. Google ScholarDigital Library
- C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. pages 89--96, 2005. Google ScholarDigital Library
- O. Chapelle and Ya Zhang. A dynamic bayesian network model for web search ranking. In Proc. of Intl. Conf. on World Wide Web, 2009. Google ScholarDigital Library
- F. Diaz. Integration of news content into web results. In Proc. of Intl. Conf. on Web Search and Data Mining, 2009. Google ScholarDigital Library
- F. Diaz and J. Arguello. Adaptation of offline selection predictions in presense of user feedback. In Proc. of Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2009. Google ScholarDigital Library
- P. Donmez, K. M. Svore, and C. J. C. Burges. On the local optimality of LambdaRank. In Proc. of Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2009. Google ScholarDigital Library
- J. H. Friedman. Greedy function approximation: A graidient boosting machine. Annals of Statistics, 29:1189--1232, 2001.Google ScholarCross Ref
- J. H. Friedman. Stochastic gradient boosting. Computational Statistics and Data Analysis, 38:367--378, 2001. Google ScholarDigital Library
- T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Sringer-Verlag, New York, NY, 2001.Google ScholarCross Ref
- S. Ji, T. Moon, G. Dupret, C. Liao, and Z. Zheng. User behavior driven ranking without editorial judgments. In Proc. of Intl. Conf. on Information and Knowledge Management, 2010. Google ScholarDigital Library
- T. Joachims. Optimizing search engines using clickthrough data. In Proc. 8th Ann. Intl. ACM SIGKDD Conf. on Knowledge Discovery and Data Mining, pages 133--142, 2002. Google ScholarDigital Library
- T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinkski, and G. Gay. Evaluating the accuracy of implicit feedback from clicks and query reformations in web search. ACM Trans. on Information Retrieval, 2007. Google ScholarDigital Library
- J. Li, S. Huffman, and A. Tokuda. Good abandonment in mobile and PC internet search. In PRoc. of Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2009. Google ScholarDigital Library
- P. Li, C. J. C. Burges, and Q. Wu. Mcrank: Learning to rank using multiple classification and gradient boosting. In Proc. 21st Proc. of Advances in Neural Information Processing Systems, 2007.Google Scholar
- V. Murdock and M. Lalmas. Workshop on aggregated search, 2008. http://www.sigir.org/forum/2008D/sigirwksp/2008d_sigirforum_murdock.pdf. Google ScholarDigital Library
- F. Radlinski and T. Joachims. Minimally invasive randomization for collecting unbiased preferences from clickthrough logs. In Proc. of AAAI, 2005. Google ScholarDigital Library
- S. Robertson and S. Walker. Some simple approximations to the 2-poisson model for probabilistic weighted retrieval. In Proc. of Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 1994. Google ScholarDigital Library
- M. Shokouhi and L. Si. Federated information retrieval. In D. W. Oard and Editors F. Sebastiani, editors, Foundations and Trends in Information Retrieval. 2010.Google Scholar
- M. Shokouhi, J. Zobel, S. Tahaghoghi, and F. Scholer. Using query logs to establish vocabularies in distributed information retrieval. Information Processing and Management, 43(1):169--180, 2007. Google ScholarDigital Library
- L. Si and J. Callan. Modeling search engine effectiveness for federated search. In Proc. of Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2005. Google ScholarDigital Library
- Z. Zheng, H. Zha,, K. Chen, and G. Sun. A regression framework for learning ranking functions using relative judgments. In Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 287--294, 2007. Google ScholarDigital Library
- Z. Zheng, H. Zha, T. Zhang, O. Chapelle, K. Chen, and G. Sun. A general boosting method and its application to learning ranking functions for web search. In Proc. 21st Proc. of Advances in Neural Information Processing Systems, 2007.Google Scholar
Index Terms
On composition of a federated web search result page: using online users to provide pairwise preference for heterogeneous verticals
Recommendations
Snippet-Based relevance predictions for federated web search
ECIR'13: Proceedings of the 35th European conference on Advances in Information RetrievalHow well can the relevance of a page be predicted, purely based on snippets? This would be highly useful in a Federated Web Search setting where caching large amounts of result snippets is more feasible than caching entire pages. The experiments ...
Web search result summarization: title selection algorithms and user satisfaction
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementEye tracking experiments have shown that titles of Web search results play a crucial role in guiding a user's search process. We present a machine-learned algorithm that trains a boosted tree to pick the most relevant title for a Web search result. We ...
Federated Search
Federated search (federated information retrieval or distributed information retrieval) is a technique for searching multiple text collections simultaneously. Queries are submitted to a subset of collections that are most likely to return relevant ...
Comments