ABSTRACT
Relevance feedback has been demonstrated to be an effective strategy for improving retrieval accuracy. The existing relevance feedback algorithms based on language models and vector space models are not effective in learning from negative feedback documents, which are abundant if the initial query is difficult. The probabilistic retrieval model has the advantage of being able to naturally improve the estimation of both the relevant and non-relevant models. The Dirichlet compound multinomial (DCM) distribution, which relies on hierarchical Bayesian modeling techniques, is a more appropriate generative model for the probabilistic retrieval model than the traditional multinomial distribution. We propose a new relevance feedback algorithm, based on a mixture model of the DCM distribution, to effectively model the overlaps between the positive and negative feedback documents. Consequently, the new algorithm improves the retrieval performance substantially for difficult queries. To further reduce human relevance evaluation, we propose a new active learning algorithm in conjunction with the new relevance feedback model. The new active learning algorithm implicitly models the diversity, density and relevance of unlabeled data in a transductive experimental design framework. Experimental results on several TREC datasets show that both the relevance feedback and active learning algorithm significantly improve retrieval accuracy.
- The lemur toolkit. http://www.lemurproject.org.Google Scholar
- D. Carmel, E. Yom-Tov, A. Darlow, and D. Pelleg. What makes a query difficult? In SIGIR, 2006. Google ScholarDigital Library
- D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. Technical report, MIT, 1995. Google ScholarDigital Library
- W. B. Croft. Experiments with representation in a document retrieval system. Information Technology: Research and Development, 2(1), 1983.Google Scholar
- S. Cronen-Townsend, Y. Zhou, and W. Croft. Predicting query performance. In Proceedings of the ACM SIGIR Conference, 2002. Google ScholarDigital Library
- C. Elkan. Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution. In ICML, 2006. Google ScholarDigital Library
- H. Fang, T. Tao, and C. Zhai. A formal study of information retrieval heuristics. In SIGIR, 2004. Google ScholarDigital Library
- D. Harman. Relevance feedback revisited. In Proceedings ACM SIGIR Conference, 1992. Google ScholarDigital Library
- D. Lewis. Naive (bayes) at forty: The independence assumption in information retrieval. In Proceedings of 10th ECML Conference, 1998. Google ScholarDigital Library
- R. Madsen, D. Kauchak, and C. Elkan. Modeling word burstiness using the Dirichlet distribution. In proceedings of 22nd ICML, 2005. Google ScholarDigital Library
- T. Minka. Estimating a Dirichlet distribution. Technical report, Microsoft Research, 2003.Google Scholar
- S. Robertson and K. S. Jones. Relevance weighting of search term. Journal of the American Society for Information Science, 27, 1976.Google Scholar
- S. E. Robertson. The probability ranking principle in IR. Journal of Documentation, 33, 1977.Google Scholar
- J. Rocchio. Relevance feedback in information retrieval, In The Smart System: experiments in automatic document processing. Prentice Hall, 1971.Google Scholar
- X. Shen and C. Zhai. Active feedback in ad hoc information retrieval. In SIGIR Conference, 2005. Google ScholarDigital Library
- T. Tao and C. Zhai. Regularized estimation of mixture models for robust pseudo relevance feedback. In Proceedings of the 26th ACM SIGIR Conference, 2006. Google ScholarDigital Library
- S. Tong and D. Koller. Support vector machine active learning with applications to text classification. In Proceedings of 17th ICML Conference, 2000. Google ScholarDigital Library
- N. Ueda and R. Nakano. Deterministic annealing EM algorithm. Neural Networks, 1998. Google ScholarDigital Library
- X. Wang, H. Fang, and C. Zhai. Improve retrieval accuracy for difficult queries using negative feedback. In Proceedings of the 16th CIKM Conference, 2007. Google ScholarDigital Library
- Z. Xu and R. Akella. A bayesian logistic regression model for active relevance feedback. In Proceedings of the 31st ACM SIGIR Conference, 2008. Google ScholarDigital Library
- Z. Xu and R. Akella. A new probabilistic retrieval model based on the Dirichlet compound multinomial distribution. In SIGIR, 2008. Google ScholarDigital Library
- Z. Xu, R. Akella, and Y. Zhang. Incorporating diversity and density in active learning for relevance feedback. In ECIR, 2007. Google ScholarDigital Library
- E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In SIGIR, 2005. Google ScholarDigital Library
- K. Yu, J. Bi, and V. Tresp. Active learning via transductive experimental design. In ICML, 2006. Google ScholarDigital Library
- C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the 10th CIKM Conference, 2001. Google ScholarDigital Library
- C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, 2001. Google ScholarDigital Library
- W. Zhang, X. He, B. Rey, and R. Jones. Query rewriting using active learning for sponsored search. In Proceedings of the 30th ACM SIGIR Conference, 2007. Google ScholarDigital Library
- Y. Zhou and W. B. Croft. Ranking robustness: A novel framework to predict query performance. In Proceedings of the 15th ACM CIKM Conference, 2006. Google ScholarDigital Library
Index Terms
- Active relevance feedback for difficult queries
Recommendations
Interactive content-based image retrieval using relevance feedback
Database search engines are generally used in a one-shot fashion in which a user provides query information to the system and, in return, the system provides a number of database instances to the user. A relevance feedback system allows the user to ...
Image retrieval based on indexing and relevance feedback
In content based image retrieval (CBIR) system, search engine retrieves the images similar to the query image according to a similarity measure. It should be fast enough and must have a high precision of retrieval. Indexing scheme is used to achieve a ...
The Study of Methods for Language Model Based Positive and Negative Relevance Feedback in Information Retrieval
ISISE '12: Proceedings of the 2012 Fourth International Symposium on Information Science and EngineeringRelevance feedback techniques are important to Information retrieval (IR), which can effectively improve the performance of IR. The feedback includes positive and negative relevance one. The most of the previous work using feedback have focused on ...
Comments