ABSTRACT
We introduce a new relevance scoring technique that enhances existing relevance scoring schemes with term position information. This technique uses chronological term rank (CTR) which captures the positions of terms as they occur in the sequence of words in a document. CTR is both conceptually and computationally simple when compared to other approaches that use document structure information, such as term proximity, term order and document features. CTR works well when paired with Okapi BM25. We evaluate the performance of various combinations of CTR with Okapi BM25 in order to identify the most effective formula. We then compare the performance of the selected approach against the performance of existing methods such as Okapi BM25, pivoted length normalization and language models. Significant improvements are seen consistently across a variety of TREC data and topic sets, measured by the major retrieval performance metrics. This seems to be the first use of this statistic for relevance scoring. There is likely to be greater retrieval improvements possible using chronological term rank enhanced methods in future work.
- V. N. Anh and A. Moffat. Impact transformation: effective and efficient web retrieval. In SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 3--10, 2002. Google ScholarDigital Library
- V. N. Anh and A. Moffat. Simplified similarity scoring using term ranks. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 226--233, 2005. Google ScholarDigital Library
- R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999. Google ScholarDigital Library
- M. Beigbeder and A. Mercier. An information retrieval model using the fuzzy proximity degree of term occurences. In SAC '05: Proceedings of the 2005 ACM symposium on Applied computing, pages 1018--1022, 2005. Google ScholarDigital Library
- S. Büttcher, C. L. A. Clarke, and B. Lushman. Term proximity scoring for ad-hoc retrieval on very large text collections. In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 621--622, 2006. Google ScholarDigital Library
- J. R. Dominick. The Dynamics of Mass Communication. McGraw-Hill Inc., 1990.Google Scholar
- D. Hawking and P. Thistlewaite. Relevance weighting using distance between term occurrences. Technical Report TR-CS-96-08, The Australian National University, August 1996.Google Scholar
- R. Jin, A. G. Hauptmann, and C. X. Zhai. Title language model for information retrieval. In SIGIR'02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 42--48, 2002. Google ScholarDigital Library
- K. S. Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11--21, 1972.Google ScholarCross Ref
- E. M. Keen. Term position ranking: some new test results. In SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, pages 66--76, 1992. Google ScholarDigital Library
- H. P. Luhn. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2:159--168, 1958.Google ScholarDigital Library
- M. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.Google ScholarCross Ref
- Y. Rasolofo and J. Savoy. Term proximity scoring for keyword-based retrieval systems. In Proceedings of the 25th European Conference on IR Research (ECIR 2003), pages 207--218, April 2003. Google ScholarDigital Library
- S. Robertson, H. Zaragoza, and M. Taylor. Simple bm25 extension to multiple weighted fields. In CIKM'04: Proceedings of the thirteenth ACM international conference on Information and knowledge management, pages 42--49, 2004. Google ScholarDigital Library
- S. E. Robertson, S. Walker, and M. Beaulieu. Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive track. In Proceedings of the Seventh Text REtrieval Conference (TREC-7), NIST Special Publication 500-242, pages 253--264, July 1999.Google Scholar
- A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In SIGIR '96: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 21--29, 1996. Google ScholarDigital Library
- T. Strohman, D. Metzler, H. Turtle, and W. B. Croft. Indri: A language model-based search engine for complex queries. Technical Report IR-416, University of Massachusetts Amherst, 2005.Google Scholar
- E. M. Voorhees and L. P. Buckland, editors. Proceedings of the Fourteenth Text REtrieval Conference (TREC 2005), NIST Special Publication 500-266. National Institute of Standards and Technology, November 15-18 2005.Google Scholar
- H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC 13: Web and hard tracks. In Proceedings of the the Thirteenth Text REtrieval Conference (TREC 2004), NIST Special Publication 500--261, 2004.Google Scholar
Index Terms
- Enhancing relevance scoring with chronological term rank
Recommendations
Term Proximity Constraints for Pseudo-Relevance Feedback
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information RetrievalPseudo-relevance feedback (PRF) refers to a query expansion strategy based on top-retrieved documents, which has been shown to be highly effective in many retrieval models. Previous work has introduced a set of constraints (axioms) that should be ...
Relevance ranking for one to three term queries
RIAO '97: Computer-Assisted Information Searching on InternetWe investigate the application of a novel relevance ranking technique, cover density ranking, to the requirements of Web-based information retrieval, where a typical query consists of a few search terms and a typical result consists of a page indicating ...
Document reranking by term distribution and maximal marginal relevance for Chinese information retrieval
Special issue: AIRS2005: Information retrieval research in AsiaIn this paper, we propose a document reranking method for Chinese information retrieval. The method is based on a term weighting scheme, which integrates local and global distribution of terms as well as document frequency, document positions and term ...
Comments