Abstract
Learning to rank has become increasingly important for many information retrieval applications. To reduce the labeling cost at training data preparation, many active sampling algorithms have been proposed. In this article, we propose a novel active learning-for-ranking strategy called ranking-based sensitivity sampling (RSS), which is tailored for Gradient Boosting Decision Tree (GBDT), a machine-learned ranking method widely used in practice by major commercial search engines for ranking. We leverage the property of GBDT that samples close to the decision boundary tend to be sensitive to perturbations and design the active learning strategy accordingly. We further theoretically analyze the proposed strategy by exploring the connection between the sensitivity used for sample selection and model regularization to provide a potentially theoretical guarantee w.r.t. the generalization capability. Considering that the performance metrics of ranking overweight the top-ranked items, item rank is incorporated into the selection function. In addition, we generalize the proposed technique to several other base learners to show its potential applicability in a wide variety of applications. Substantial experimental results on both the benchmark dataset and a real-world dataset have demonstrated that our proposed active learning strategy is highly effective in selecting the most informative examples.
- N. Abe and H. Mamitsuka. 1998. Query learning strategies using boosting and bagging. In Proceedings of the 15th International Conference on Machine Learning (ICML’98). 1--10. Google ScholarDigital Library
- N. Ailon. 2011. Active learning ranking from pairwise preferences with almost optimal query complexity. In Advances in Neural Information Processing Systems (NIPS’11). 810--818.Google Scholar
- J. A. Aslam, E. Kanoulas, V. Pavlu, S. Savev, and E. Yilmaz. 2009. Document selection methodologies for efficient and effective learning-to-rank. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’09). 468--475. Google ScholarDigital Library
- M. Bilgic and P. N. Bennett. 2012. Active query selection for learning rankers. In Proceedings of the 35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’12). Google ScholarDigital Library
- C. Bishop. 1995. Training with noise is equivalent to tikhonov regularization. Neural Computation (1995), 108--116. Google ScholarDigital Library
- P. Cai, W. Gao, A. Zhou, and K. F. Wong. 2011. Relevant knowledge helps in choosing right teacher: Active query selection for ranking adaptation. In Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). 115--124. Google ScholarDigital Library
- W. Cai and Y. Zhang. 2012. Variance maximization via noise injection for active sampling in learning to rank. In Proceedings of the 21st Conference on Information and Knowledge Management (CIKM’12). 1809--1813. Google ScholarDigital Library
- Y. B. Cao, J. Xu, T. Y. Liu, and H. Li. 2006. Adapting ranking SVM to document retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’06). 186--193. Google ScholarDigital Library
- O. Chapelle, P. Shivaswamy, S. Vadrevu, K. Weinberger, Y. Zhang, and B. Tseng. 2010. Multi-task learning for boosting with application to web search ranking. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’10). 1189--1198. Google ScholarDigital Library
- O. Chapelle, P. Shivaswamy, S. Vadrevu, K. Weinberger, Y. Zhang, and B. Tseng. 2011. Boosted multi-task learning. Machine Learning 85, 1--2 (2011), 149--173. Google ScholarDigital Library
- D. A. Chon, Z. Ghahramani, and M. I. Jordan. 1996. Active learning with statistical models. Journal of Machine Learning Research (1996), 129--145. Google ScholarDigital Library
- D. Cossock and T. Zhang. 2006. Subset ranking using regression. In Proceedings of the 16th International Conference on Learning Theory (COLT’06). 605--619. Google ScholarDigital Library
- P. Donmez and J. G. Carbonell. 2008. Optimizing estimated loss reduction for active sampling in rank learning. In Proceedings of the 25th International Conference on Machine Learning (ICML’08). 248--255. Google ScholarDigital Library
- Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. 2003. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research 4 (2003), 933--969. Google ScholarDigital Library
- Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. 1997. Selective sampling using the query by committee algorithm. Machine Learning 28, 2--3 (1997), 133--168. Google ScholarDigital Library
- J. Friedman. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics (2001), 1189--1232.Google Scholar
- T. Hastie, R. Tibshirani, and J. Friedman. 2001. The Elements of Statistical Learning. Springer.Google Scholar
- R. Herbrich, T. Graepel, and K. Obermayer. 2000. Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers. MIT Press.Google Scholar
- K. Jarvelin and J. Kekalainen. 2000. IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’00). 41--48. Google ScholarDigital Library
- D. D. Lewis and W. A. Gale. 1994. A sequential algorithm for training text classifiers. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94). 3--12. Google ScholarDigital Library
- T. Y. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. 2007. LETOR: Benchmark dataset for research on learning to rank for information retrieval. In Proceedings of SIGIR 2007 Workshop on Learning to Rank for Information Retrieval.Google Scholar
- B. Long, O. Chappelle, Y. Zhang, Y. Chang, Z. Zheng, and B. Tseng. 2010. Active learning for ranking through expected loss optimization. In Proceedings of the 33th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). 267--274. Google ScholarDigital Library
- K. Matsuoka. 1992. Noise injection into inputs in back-propagation learning. IEEE Transactions on Systems, Man, and Cybernetics 22, 3 (1992), 436--440.Google ScholarCross Ref
- H. Nguyen and A. Smeulders. 2004. Active learning using pre-clustering. In Proceedings of the 21st International Conference on Machine Learning (ICML’04). 623--630. Google ScholarDigital Library
- B. Qian, X. Wang, J. Wang, H. Li, N. Cao, W. Zhi, and I. Davisdon. 2013. Fast pairwise query selection for large-scale active learning to rank. In Proceedings of the 13th International Conference on Data Mining (ICDM’13). 607--616.Google Scholar
- N. Roy and A. McCallum. 2001. Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). 441--448. Google ScholarDigital Library
- B. Settles. 2012. Active Learning. Morgan & Claypool.Google Scholar
- K. Shen, J. Wu, Y. Zhang, Y. Han, X. Yang, L. Song, and X. Gu. 2013. Reorder users tweets. ACM Transactions on Intelligent Systems and Technology 4, 1 (2013), Article No. 6. Google ScholarDigital Library
- R. Silva, M. A. Gonçalves, and A. Veloso. 2011. Rule-based active sampling for learning to rank. In Proceedings of European Conference on Machine Learning and Principles and Practise of Knowlege Discovery in Databases (ECML-PKDD’11). 240--255. Google ScholarDigital Library
- M. Taylor, J. Guiver, S. Robertson, and T. Minka. 2008. SoftRank: Optimizing non-smooth rank metrics. Proceedings of the 1st ACM International Conference on Web Search and Data Mining (WSDM’08). 77--86. Google ScholarDigital Library
- S. Tong and D. Koller. 2001. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research 2 (2001), 45--66. Google ScholarDigital Library
- G. Valentini and T. G. Dietterich. 2004. Bias-variance analysis of support vector machines for the development of SVM-based ensemble methods. Journal of Machine Learning Research 5 (2004), 725--775. Google ScholarDigital Library
- F. Xia, T. Y. Liu, J. Wang, W. Zhang, and H. Li. 2008. Listwise approach to learning to rank: Theory and algorithm. In Proceedings of the 25th International Conference on Machine Learning (ICML’08). 1192--1199. Google ScholarDigital Library
- L. Yang, L. Wang, B. Geng, and X. Hua. 2009. Query sampling for ranking learning in web search. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’09). 754--755. Google ScholarDigital Library
- E. Yilmaz and S. Robertson. 2009. Deep versus shallow judgments in learning to rank. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’09). 662--663. Google ScholarDigital Library
- H. Yu. 2005. SVM selective sampling for ranking with application to data retrieval. In Proceedings of the 11st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’05). 354--363. Google ScholarDigital Library
- Y. Yue, T. Finley, F. Radlinski, and T. Joachims. 2007. A support vector method for optimizing average precision. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07). 271--278. Google ScholarDigital Library
- J. Zhu, H. Wang, E. Hovy, and M. Ma. 2010a. Confidence-based stopping criteria for active learning for data annotation. ACM Transactions on Speech and Language Processing 6, 3 (2010), Article No. 3, 1--24. Google ScholarDigital Library
- J. Zhu, H. Wang, B. Tsou, and M. Ma. 2010b. Active learning with sampling by uncertainty and density for data annotations. IEEE Transactions on Audio, Speech and Language Processing 18, 6 (2010), 1323--1331. Google ScholarDigital Library
Index Terms
- Active Learning for Web Search Ranking via Noise Injection
Recommendations
Variance maximization via noise injection for active sampling in learning to rank
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementActive learning for ranking, which is to selectively label the most informative examples, has been widely studied in recent years. In this paper, we propose a general active learning for ranking strategy called Variance Maximization (VM). The algorithm ...
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementThis work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
A dynamic bayesian network click model for web search ranking
WWW '09: Proceedings of the 18th international conference on World wide webAs with any application of machine learning, web search ranking requires labeled data. The labels usually come in the form of relevance assessments made by editors. Click logs can also provide an important source of implicit feedback and can be used as ...
Comments