Article

Enhancing relevance scoring with chronological term rank

Authors:
Adam D. Troy

Case Western Reserve University, Cleveland, OH

Case Western Reserve University, Cleveland, OH
View Profile

,
Guo-Qiang Zhang

Case Western Reserve University, Cleveland, OH

Case Western Reserve University, Cleveland, OH
View Profile

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrievalJuly 2007Pages 599–606https://doi.org/10.1145/1277741.1277844

Published:23 July 2007Publication History

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 599–606

ABSTRACT

We introduce a new relevance scoring technique that enhances existing relevance scoring schemes with term position information. This technique uses chronological term rank (CTR) which captures the positions of terms as they occur in the sequence of words in a document. CTR is both conceptually and computationally simple when compared to other approaches that use document structure information, such as term proximity, term order and document features. CTR works well when paired with Okapi BM25. We evaluate the performance of various combinations of CTR with Okapi BM25 in order to identify the most effective formula. We then compare the performance of the selected approach against the performance of existing methods such as Okapi BM25, pivoted length normalization and language models. Significant improvements are seen consistently across a variety of TREC data and topic sets, measured by the major retrieval performance metrics. This seems to be the first use of this statistic for relevance scoring. There is likely to be greater retrieval improvements possible using chronological term rank enhanced methods in future work.

References

V. N. Anh and A. Moffat. Impact transformation: effective and efficient web retrieval. In SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 3--10, 2002. Google ScholarDigital Library
V. N. Anh and A. Moffat. Simplified similarity scoring using term ranks. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 226--233, 2005. Google ScholarDigital Library
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999. Google ScholarDigital Library
M. Beigbeder and A. Mercier. An information retrieval model using the fuzzy proximity degree of term occurences. In SAC '05: Proceedings of the 2005 ACM symposium on Applied computing, pages 1018--1022, 2005. Google ScholarDigital Library
S. Büttcher, C. L. A. Clarke, and B. Lushman. Term proximity scoring for ad-hoc retrieval on very large text collections. In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 621--622, 2006. Google ScholarDigital Library
J. R. Dominick. The Dynamics of Mass Communication. McGraw-Hill Inc., 1990.Google Scholar
D. Hawking and P. Thistlewaite. Relevance weighting using distance between term occurrences. Technical Report TR-CS-96-08, The Australian National University, August 1996.Google Scholar
R. Jin, A. G. Hauptmann, and C. X. Zhai. Title language model for information retrieval. In SIGIR'02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 42--48, 2002. Google ScholarDigital Library
K. S. Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11--21, 1972.Google ScholarCross Ref
E. M. Keen. Term position ranking: some new test results. In SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, pages 66--76, 1992. Google ScholarDigital Library
H. P. Luhn. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2:159--168, 1958.Google ScholarDigital Library
M. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.Google ScholarCross Ref
Y. Rasolofo and J. Savoy. Term proximity scoring for keyword-based retrieval systems. In Proceedings of the 25th European Conference on IR Research (ECIR 2003), pages 207--218, April 2003. Google ScholarDigital Library
S. Robertson, H. Zaragoza, and M. Taylor. Simple bm25 extension to multiple weighted fields. In CIKM'04: Proceedings of the thirteenth ACM international conference on Information and knowledge management, pages 42--49, 2004. Google ScholarDigital Library
S. E. Robertson, S. Walker, and M. Beaulieu. Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive track. In Proceedings of the Seventh Text REtrieval Conference (TREC-7), NIST Special Publication 500-242, pages 253--264, July 1999.Google Scholar
A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In SIGIR '96: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 21--29, 1996. Google ScholarDigital Library
T. Strohman, D. Metzler, H. Turtle, and W. B. Croft. Indri: A language model-based search engine for complex queries. Technical Report IR-416, University of Massachusetts Amherst, 2005.Google Scholar
E. M. Voorhees and L. P. Buckland, editors. Proceedings of the Fourteenth Text REtrieval Conference (TREC 2005), NIST Special Publication 500-266. National Institute of Standards and Technology, November 15-18 2005.Google Scholar
H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC 13: Web and hard tracks. In Proceedings of the the Thirteenth Text REtrieval Conference (TREC 2004), NIST Special Publication 500--261, 2004.Google Scholar

Index Terms

Enhancing relevance scoring with chronological term rank
1. Information systems
  1. Information retrieval

Recommendations

Term Proximity Constraints for Pseudo-Relevance Feedback
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pseudo-relevance feedback (PRF) refers to a query expansion strategy based on top-retrieved documents, which has been shown to be highly effective in many retrieval models. Previous work has introduced a set of constraints (axioms) that should be ...
Read More
Relevance ranking for one to three term queries
RIAO '97: Computer-Assisted Information Searching on Internet

We investigate the application of a novel relevance ranking technique, cover density ranking, to the requirements of Web-based information retrieval, where a typical query consists of a few search terms and a typical result consists of a page indicating ...
Read More
Document reranking by term distribution and maximal marginal relevance for Chinese information retrieval
Special issue: AIRS2005: Information retrieval research in Asia

In this paper, we propose a document reranking method for Chinese information retrieval. The method is based on a term weighting scheme, which integrates local and global distribution of terms as well as document frequency, document positions and term ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
July 2007
946 pages
ISBN:9781595935977
DOI:10.1145/1277741
General Chairs:
Wessel Kraaij
TNO, The Netherlands
,
Arjen P. de Vries
CWI, The Netherlands
,
Program Chairs:
Charles L. A. Clarke
University of Waterloo, Canada
,
Norbert Fuhr
University of Duisburg-Essen, Germany
,
Noriko Kando
National Institute of Informatics, Japan
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 July 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
chronological term rank
document structure
relevance ranking
similarity scoring
term position
term weighting
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 828
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Enhancing relevance scoring with chronological term rank

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Term Proximity Constraints for Pseudo-Relevance Feedback

Relevance ranking for one to three term queries

Document reranking by term distribution and maximal marginal relevance for Chinese information retrieval