skip to main content
10.1145/1277741.1277844acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Enhancing relevance scoring with chronological term rank

Published:23 July 2007Publication History

ABSTRACT

We introduce a new relevance scoring technique that enhances existing relevance scoring schemes with term position information. This technique uses chronological term rank (CTR) which captures the positions of terms as they occur in the sequence of words in a document. CTR is both conceptually and computationally simple when compared to other approaches that use document structure information, such as term proximity, term order and document features. CTR works well when paired with Okapi BM25. We evaluate the performance of various combinations of CTR with Okapi BM25 in order to identify the most effective formula. We then compare the performance of the selected approach against the performance of existing methods such as Okapi BM25, pivoted length normalization and language models. Significant improvements are seen consistently across a variety of TREC data and topic sets, measured by the major retrieval performance metrics. This seems to be the first use of this statistic for relevance scoring. There is likely to be greater retrieval improvements possible using chronological term rank enhanced methods in future work.

References

  1. V. N. Anh and A. Moffat. Impact transformation: effective and efficient web retrieval. In SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 3--10, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. N. Anh and A. Moffat. Simplified similarity scoring using term ranks. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 226--233, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Beigbeder and A. Mercier. An information retrieval model using the fuzzy proximity degree of term occurences. In SAC '05: Proceedings of the 2005 ACM symposium on Applied computing, pages 1018--1022, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Büttcher, C. L. A. Clarke, and B. Lushman. Term proximity scoring for ad-hoc retrieval on very large text collections. In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 621--622, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. R. Dominick. The Dynamics of Mass Communication. McGraw-Hill Inc., 1990.Google ScholarGoogle Scholar
  7. D. Hawking and P. Thistlewaite. Relevance weighting using distance between term occurrences. Technical Report TR-CS-96-08, The Australian National University, August 1996.Google ScholarGoogle Scholar
  8. R. Jin, A. G. Hauptmann, and C. X. Zhai. Title language model for information retrieval. In SIGIR'02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 42--48, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. S. Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11--21, 1972.Google ScholarGoogle ScholarCross RefCross Ref
  10. E. M. Keen. Term position ranking: some new test results. In SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, pages 66--76, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. P. Luhn. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2:159--168, 1958.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  13. Y. Rasolofo and J. Savoy. Term proximity scoring for keyword-based retrieval systems. In Proceedings of the 25th European Conference on IR Research (ECIR 2003), pages 207--218, April 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Robertson, H. Zaragoza, and M. Taylor. Simple bm25 extension to multiple weighted fields. In CIKM'04: Proceedings of the thirteenth ACM international conference on Information and knowledge management, pages 42--49, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. E. Robertson, S. Walker, and M. Beaulieu. Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive track. In Proceedings of the Seventh Text REtrieval Conference (TREC-7), NIST Special Publication 500-242, pages 253--264, July 1999.Google ScholarGoogle Scholar
  16. A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In SIGIR '96: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 21--29, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Strohman, D. Metzler, H. Turtle, and W. B. Croft. Indri: A language model-based search engine for complex queries. Technical Report IR-416, University of Massachusetts Amherst, 2005.Google ScholarGoogle Scholar
  18. E. M. Voorhees and L. P. Buckland, editors. Proceedings of the Fourteenth Text REtrieval Conference (TREC 2005), NIST Special Publication 500-266. National Institute of Standards and Technology, November 15-18 2005.Google ScholarGoogle Scholar
  19. H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC 13: Web and hard tracks. In Proceedings of the the Thirteenth Text REtrieval Conference (TREC 2004), NIST Special Publication 500--261, 2004.Google ScholarGoogle Scholar

Index Terms

  1. Enhancing relevance scoring with chronological term rank

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
            July 2007
            946 pages
            ISBN:9781595935977
            DOI:10.1145/1277741

            Copyright © 2007 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 23 July 2007

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate792of3,983submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader