research-article

Active relevance feedback for difficult queries

Authors:
Zuobing Xu

University of California, Santa Cruz, Santa Cruz, CA, USA

University of California, Santa Cruz, Santa Cruz, CA, USA
View Profile

,
Ram Akella

University of California, Santa Cruz, Santa Cruz, CA, USA

University of California, Santa Cruz, Santa Cruz, CA, USA
View Profile

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge managementOctober 2008Pages 459–468https://doi.org/10.1145/1458082.1458144

Published:26 October 2008Publication History

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

Pages 459–468

ABSTRACT

Relevance feedback has been demonstrated to be an effective strategy for improving retrieval accuracy. The existing relevance feedback algorithms based on language models and vector space models are not effective in learning from negative feedback documents, which are abundant if the initial query is difficult. The probabilistic retrieval model has the advantage of being able to naturally improve the estimation of both the relevant and non-relevant models. The Dirichlet compound multinomial (DCM) distribution, which relies on hierarchical Bayesian modeling techniques, is a more appropriate generative model for the probabilistic retrieval model than the traditional multinomial distribution. We propose a new relevance feedback algorithm, based on a mixture model of the DCM distribution, to effectively model the overlaps between the positive and negative feedback documents. Consequently, the new algorithm improves the retrieval performance substantially for difficult queries. To further reduce human relevance evaluation, we propose a new active learning algorithm in conjunction with the new relevance feedback model. The new active learning algorithm implicitly models the diversity, density and relevance of unlabeled data in a transductive experimental design framework. Experimental results on several TREC datasets show that both the relevance feedback and active learning algorithm significantly improve retrieval accuracy.

References

The lemur toolkit. http://www.lemurproject.org.Google Scholar
D. Carmel, E. Yom-Tov, A. Darlow, and D. Pelleg. What makes a query difficult? In SIGIR, 2006. Google ScholarDigital Library
D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. Technical report, MIT, 1995. Google ScholarDigital Library
W. B. Croft. Experiments with representation in a document retrieval system. Information Technology: Research and Development, 2(1), 1983.Google Scholar
S. Cronen-Townsend, Y. Zhou, and W. Croft. Predicting query performance. In Proceedings of the ACM SIGIR Conference, 2002. Google ScholarDigital Library
C. Elkan. Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution. In ICML, 2006. Google ScholarDigital Library
H. Fang, T. Tao, and C. Zhai. A formal study of information retrieval heuristics. In SIGIR, 2004. Google ScholarDigital Library
D. Harman. Relevance feedback revisited. In Proceedings ACM SIGIR Conference, 1992. Google ScholarDigital Library
D. Lewis. Naive (bayes) at forty: The independence assumption in information retrieval. In Proceedings of 10th ECML Conference, 1998. Google ScholarDigital Library
R. Madsen, D. Kauchak, and C. Elkan. Modeling word burstiness using the Dirichlet distribution. In proceedings of 22nd ICML, 2005. Google ScholarDigital Library
T. Minka. Estimating a Dirichlet distribution. Technical report, Microsoft Research, 2003.Google Scholar
S. Robertson and K. S. Jones. Relevance weighting of search term. Journal of the American Society for Information Science, 27, 1976.Google Scholar
S. E. Robertson. The probability ranking principle in IR. Journal of Documentation, 33, 1977.Google Scholar
J. Rocchio. Relevance feedback in information retrieval, In The Smart System: experiments in automatic document processing. Prentice Hall, 1971.Google Scholar
X. Shen and C. Zhai. Active feedback in ad hoc information retrieval. In SIGIR Conference, 2005. Google ScholarDigital Library
T. Tao and C. Zhai. Regularized estimation of mixture models for robust pseudo relevance feedback. In Proceedings of the 26th ACM SIGIR Conference, 2006. Google ScholarDigital Library
S. Tong and D. Koller. Support vector machine active learning with applications to text classification. In Proceedings of 17th ICML Conference, 2000. Google ScholarDigital Library
N. Ueda and R. Nakano. Deterministic annealing EM algorithm. Neural Networks, 1998. Google ScholarDigital Library
X. Wang, H. Fang, and C. Zhai. Improve retrieval accuracy for difficult queries using negative feedback. In Proceedings of the 16th CIKM Conference, 2007. Google ScholarDigital Library
Z. Xu and R. Akella. A bayesian logistic regression model for active relevance feedback. In Proceedings of the 31st ACM SIGIR Conference, 2008. Google ScholarDigital Library
Z. Xu and R. Akella. A new probabilistic retrieval model based on the Dirichlet compound multinomial distribution. In SIGIR, 2008. Google ScholarDigital Library
Z. Xu, R. Akella, and Y. Zhang. Incorporating diversity and density in active learning for relevance feedback. In ECIR, 2007. Google ScholarDigital Library
E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In SIGIR, 2005. Google ScholarDigital Library
K. Yu, J. Bi, and V. Tresp. Active learning via transductive experimental design. In ICML, 2006. Google ScholarDigital Library
C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the 10th CIKM Conference, 2001. Google ScholarDigital Library
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, 2001. Google ScholarDigital Library
W. Zhang, X. He, B. Rey, and R. Jones. Query rewriting using active learning for sponsored search. In Proceedings of the 30th ACM SIGIR Conference, 2007. Google ScholarDigital Library
Y. Zhou and W. B. Croft. Ranking robustness: A novel framework to predict query performance. In Proceedings of the 15th ACM CIKM Conference, 2006. Google ScholarDigital Library

Recommendations

A bayesian logistic regression model for active relevance feedback
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Relevance feedback, which traditionally uses the terms in the relevant documents to enrich the user's initial query, is an effective method for improving retrieval performance. The traditional relevance feedback algorithms lead to overfitting because of ...
Read More
Adaptive relevance feedback in information retrieval
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Relevance Feedback has proven very effective for improving retrieval accuracy. A difficult yet important problem in all relevance feedback methods is how to optimally balance the original query and feedback information. In the current feedback methods, ...
Read More
A Reinforcement Learning Framework for Relevance Feedback
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

We present RML, the first known general reinforcement learning framework for relevance feedback that directly optimizes any desired retrieval metric, including precision-oriented, recall-oriented, and even diversity metrics: RML can be easily extended ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
October 2008
1562 pages
ISBN:9781595939913
DOI:10.1145/1458082
General Chair:
James G. Shanahan
Church and Duncan Group Inc, USA
,
Program Chairs:
Sihem Amer-Yahia
Yahoo! Research, USA
,
Ioana Manolescu
INRIA, France
,
Yi Zhang
University of California, Santa Cruz, USA
,
David A. Evans
JustSystems Evans Research, USA
,
Alek Kolcz
Microsoft Live Labs, USA
,
Key-Sun Choi
KAIST, Korea
,
Abdur Chowdury
Twitter, USA
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 October 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
active learning
probabilistic retrieval model
relevance feedback
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 591
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Active relevance feedback for difficult queries

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

ABSTRACT

References

Cited By

Recommendations

A bayesian logistic regression model for active relevance feedback

Adaptive relevance feedback in information retrieval

A Reinforcement Learning Framework for Relevance Feedback