ABSTRACT
Learning-to-rank has attracted great attention in the IR community. Much thought and research has been placed on query-document feature extraction and development of sophisticated learning-to-rank algorithms. However, relatively little research has been conducted on selecting documents for learning-to-rank data sets nor on the effect of these choices on the efficiency and effectiveness of learning-to-rank algorithms.
In this paper, we employ a number of document selection methodologies, widely used in the context of evaluation--depth-k pooling, sampling (infAP, statAP), active-learning (MTC), and on-line heuristics (hedge). Certain methodologies, e.g. sampling and active-learning, have been shown to lead to efficient and effective evaluation. We investigate whether they can also enable efficient and effective learning-to-rank. We compare them with the document selection methodology used to create the LETOR datasets.
Further, all of the utilized methodologies are different in nature, and thus they construct training data sets with different properties, such as the proportion of relevant documents in the data or the similarity among them. We study how such properties affect the efficiency, effectiveness, and robustness of learning-to-rank collections.
- J. A. Aslam, V. Pavlu, and R. Savell. A unified model for metasearch and the efficient evaluation of retrieval systems via the hedge algorithm. In J. Callan, G. Cormack, C. Clarke, D. Hawking, and A. Smeaton, editors, Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 393--394. ACM Press, July 2003. Google ScholarDigital Library
- J. A. Aslam, V. Pavlu, and E. Yilmaz. A statistical method for system evaluation using incomplete judgments. In S. Dumais, E. N. Efthimiadis, D. Hawking, and K. Jarvelin, editors, Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 541--548. ACM Press, August 2006. Google ScholarDigital Library
- C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML '05: Proceedings of the 22nd international conference on Machine learning, pages 89--96, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- C. J. C. Burges, R. Ragno, and Q. V. Le. Learning to rank with nonsmooth cost functions. In B. Schölkopf, J. C. Platt, T. Homan, B. Schölkopf, J. C. Platt, and T. Homan, editors, NIPS, pages 193--200. MIT Press, 2006.Google Scholar
- B. Carterette, J. Allan, and R. Sitaraman. Minimal test collections for retrieval evaluation. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 268--275, 2006. Google ScholarDigital Library
- B. Carterette, V. Pavlu, E. Kanoulas, J. A. Aslam, and J. Allan. Evaluation over thousands of queries. In S.-H. Myaeng, D. W. Oard, F. Sebastiani, T.-S. Chua, and M.-K. Leong, editors, Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 651--658. ACM Press, July 2008. Google ScholarDigital Library
- W. B. Croft, A. Moat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, editors. Proceedings of the 21th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug. 1998. Information Science, 2008.Google Scholar
- Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res., 4:933--969, 2003. Google ScholarDigital Library
- D. Harman. Overview of the third text REtreival conference (TREC-3). In D. Harman, editor, Overview of the Third Text REtrieval Conference (TREC-3), pages 1--19. U.S. Government Printing Office, Apr. 1995.Google ScholarDigital Library
- T. Joachims. A support vector method for multivariate performance measures. In International Conference on Machine Learning (ICML), pages 377--384, 2005. Google ScholarDigital Library
- T. Joachims. Training linear SVMs in linear time. In ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (KDD), pages 217--226, 2006. Google ScholarDigital Library
- K. S. Jones, S. Walker, and S. E. Robertson. A probabilistic model of information retrieval: development and comparative experiments. Inf. Process. Manage., 36(6):779--808, 2000. Google ScholarDigital Library
- T.-Y. Liu, T. Qin, J. Xu, X. Wenying, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval.Google Scholar
- T. Y. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval. In SIGIR '07: Proceedings of the Learning to Rank workshop in the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007.Google Scholar
- T. Minka and S. Robertson. Selection bias in the letor datasets. In SIGIR '08: Proceedings of the of the Learning to Rank workshop 31st annual international ACM SIGIR conference on Research and development in information retrieval, New York, NY, USA, 2008. ACM.Google Scholar
- V. Pavlu. Large Scale IR Evaluation. PhD thesis, Northeastern University, College of Computer and Information Science, 2008.Google Scholar
- T. Qin, T.-Y. Liu, J. Xu, and H. Li. How to make letor more useful and reliable. In SIGIR '08: Proceedings of the of the Learning to Rank workshop 31st annual international ACM SIGIR conference on Research and development in information retrieval, New York, NY, USA, 2008. ACM.Google Scholar
- A. Singhal and G. Inc. Modern information retrieval: a brief overview. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 24:2001, 2001.Google Scholar
- M. Taylor, H. Zaragoza, N. Craswell, S. Robertson, and C. Burges. Optimisation methods for ranking functions with multiple parameters. In CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management, pages 585--593, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- E. M. Voorhees and D. Harman. Overview of the seventh text retrieval conference (TREC-7). In Proceedings of the Seventh Text REtrieval Conference (TREC-7), pages 1--24, 1999.Google ScholarCross Ref
- E. Yilmaz and J. A. Aslam. Estimating average precision with incomplete and imperfect judgments. In P. S. Yu, V. Tsotras, E. Fox, and B. Liu, editors, Proceedings of the Fifteenth ACM International Conference on Information and Knowledge Management, pages 102--111. ACM Press, November 2006. Google ScholarDigital Library
- C. Zhai and J. Laerty. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., 22(2):179--214, 2004. Google ScholarDigital Library
- J. Zobel. How reliable are the results of large-scale retrieval experiments? In Croft et al. {7}, pages 307--314. Google ScholarDigital Library
Index Terms
- Document selection methodologies for efficient and effective learning-to-rank
Recommendations
A passage-based approach to learning to rank documents
AbstractAccording to common relevance-judgments regimes, such as TREC’s, a document can be deemed relevant to a query even if it contains a very short passage of text with pertinent information. This fact has motivated work on passage-based document ...
Learning to Rank with Selection Bias in Personal Search
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information RetrievalClick-through data has proven to be a critical resource for improving search ranking quality. Though a large amount of click data can be easily collected by search engines, various biases make it difficult to fully leverage this type of data. In the ...
An Empirical Perspective on Learning-to-rank
ICCAI '23: Proceedings of the 2023 9th International Conference on Computing and Artificial IntelligenceLearning-to-rank has been widely studied and applied in document retrieval. Typically, existing learning-to-rank methods treat ranking as an independent matching process among different queries. Hence, their ranking functions remain unchanged once the ...
Comments