skip to main content
10.1145/1458082.1458144acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Active relevance feedback for difficult queries

Authors Info & Claims
Published:26 October 2008Publication History

ABSTRACT

Relevance feedback has been demonstrated to be an effective strategy for improving retrieval accuracy. The existing relevance feedback algorithms based on language models and vector space models are not effective in learning from negative feedback documents, which are abundant if the initial query is difficult. The probabilistic retrieval model has the advantage of being able to naturally improve the estimation of both the relevant and non-relevant models. The Dirichlet compound multinomial (DCM) distribution, which relies on hierarchical Bayesian modeling techniques, is a more appropriate generative model for the probabilistic retrieval model than the traditional multinomial distribution. We propose a new relevance feedback algorithm, based on a mixture model of the DCM distribution, to effectively model the overlaps between the positive and negative feedback documents. Consequently, the new algorithm improves the retrieval performance substantially for difficult queries. To further reduce human relevance evaluation, we propose a new active learning algorithm in conjunction with the new relevance feedback model. The new active learning algorithm implicitly models the diversity, density and relevance of unlabeled data in a transductive experimental design framework. Experimental results on several TREC datasets show that both the relevance feedback and active learning algorithm significantly improve retrieval accuracy.

References

  1. The lemur toolkit. http://www.lemurproject.org.Google ScholarGoogle Scholar
  2. D. Carmel, E. Yom-Tov, A. Darlow, and D. Pelleg. What makes a query difficult? In SIGIR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. Technical report, MIT, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. W. B. Croft. Experiments with representation in a document retrieval system. Information Technology: Research and Development, 2(1), 1983.Google ScholarGoogle Scholar
  5. S. Cronen-Townsend, Y. Zhou, and W. Croft. Predicting query performance. In Proceedings of the ACM SIGIR Conference, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Elkan. Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution. In ICML, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Fang, T. Tao, and C. Zhai. A formal study of information retrieval heuristics. In SIGIR, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Harman. Relevance feedback revisited. In Proceedings ACM SIGIR Conference, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Lewis. Naive (bayes) at forty: The independence assumption in information retrieval. In Proceedings of 10th ECML Conference, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Madsen, D. Kauchak, and C. Elkan. Modeling word burstiness using the Dirichlet distribution. In proceedings of 22nd ICML, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Minka. Estimating a Dirichlet distribution. Technical report, Microsoft Research, 2003.Google ScholarGoogle Scholar
  12. S. Robertson and K. S. Jones. Relevance weighting of search term. Journal of the American Society for Information Science, 27, 1976.Google ScholarGoogle Scholar
  13. S. E. Robertson. The probability ranking principle in IR. Journal of Documentation, 33, 1977.Google ScholarGoogle Scholar
  14. J. Rocchio. Relevance feedback in information retrieval, In The Smart System: experiments in automatic document processing. Prentice Hall, 1971.Google ScholarGoogle Scholar
  15. X. Shen and C. Zhai. Active feedback in ad hoc information retrieval. In SIGIR Conference, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Tao and C. Zhai. Regularized estimation of mixture models for robust pseudo relevance feedback. In Proceedings of the 26th ACM SIGIR Conference, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Tong and D. Koller. Support vector machine active learning with applications to text classification. In Proceedings of 17th ICML Conference, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. Ueda and R. Nakano. Deterministic annealing EM algorithm. Neural Networks, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. X. Wang, H. Fang, and C. Zhai. Improve retrieval accuracy for difficult queries using negative feedback. In Proceedings of the 16th CIKM Conference, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Z. Xu and R. Akella. A bayesian logistic regression model for active relevance feedback. In Proceedings of the 31st ACM SIGIR Conference, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Z. Xu and R. Akella. A new probabilistic retrieval model based on the Dirichlet compound multinomial distribution. In SIGIR, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Z. Xu, R. Akella, and Y. Zhang. Incorporating diversity and density in active learning for relevance feedback. In ECIR, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In SIGIR, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. Yu, J. Bi, and V. Tresp. Active learning via transductive experimental design. In ICML, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the 10th CIKM Conference, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. W. Zhang, X. He, B. Rey, and R. Jones. Query rewriting using active learning for sponsored search. In Proceedings of the 30th ACM SIGIR Conference, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. Zhou and W. B. Croft. Ranking robustness: A novel framework to predict query performance. In Proceedings of the 15th ACM CIKM Conference, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Active relevance feedback for difficult queries

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
      October 2008
      1562 pages
      ISBN:9781595939913
      DOI:10.1145/1458082

      Copyright © 2008 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 October 2008

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader