skip to main content
10.1145/1390334.1390375acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

A bayesian logistic regression model for active relevance feedback

Authors Info & Claims
Published:20 July 2008Publication History

ABSTRACT

Relevance feedback, which traditionally uses the terms in the relevant documents to enrich the user's initial query, is an effective method for improving retrieval performance. The traditional relevance feedback algorithms lead to overfitting because of the limited amount of training data and large term space. This paper introduces an online Bayesian logistic regression algorithm to incorporate relevance feedback information. The new approach addresses the overfitting problem by projecting the original feature space onto a more compact set which retains the necessary information. The new set of features consist of the original retrieval score, the distance to the relevant documents and the distance to non-relevant documents. To reduce the human evaluation effort in ascertaining relevance, we introduce a new active learning algorithm based on variance reduction to actively select documents for user evaluation. The new active learning algorithm aims to select feedback documents to reduce the model variance. The variance reduction approach leads to capturing relevance, diversity and uncertainty of the unlabeled documents in a principled manner. These are the critical factors of active learning indicated in previous literature. Experiments with several TREC datasets demonstrate the effectiveness of the proposed approach.

References

  1. The lemur toolkit. http://www.lemurproject.org.Google ScholarGoogle Scholar
  2. D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. In Advances in Neural Information Processing Systems, volume 7, pages 705--712. The MIT Press, 1995.Google ScholarGoogle Scholar
  3. L. D. and G. W. Training text classifiers by uncertainty sampling. In International ACM Conf. on Research and Development in Information Retrieval, 1994.Google ScholarGoogle Scholar
  4. A. Dayanik, D. D. Lewis, D. Madigan, V. Menkov, and A. Genkin. Constructing information prior distributions from domain knowledge in text classfication. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28(2-3):133--168, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Genkin, D. Lewis, and D. Madigan. large-scale bayesian logistic regression for text categorization. Technical report, DIMACS, 2004.Google ScholarGoogle Scholar
  7. D. Harman. Relevance feedback revisited. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1--10, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data mining, Inference and Prediction. Spinger, 2001.Google ScholarGoogle Scholar
  9. S. Hoi, R. Jin, J. Zhu, and M. Lyu. Batch mode active learning and its application to medical image classification. In proceedings of the 23rd international conference on machine learning, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. C. H. Hoi, R. Jin, and M. R. Lyu. Large scale text categorization by batch mode active learning. In Proceedings of International World Wide Web Conference, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Kass, L. Tierney, and J. Kadane. validity of posterior expansions based on laplace's method. Bayesian and likelihood methods in statistics and econometrics, 1990.Google ScholarGoogle Scholar
  12. S. Kay. Fundamentals of statistical signal processing. Prentice-Hall, 1993.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Kohavi and D. Sommerfield. Feature subset selection using the wrapper method: Overfitting and dynamic search space topology. In The First International Conference on Knowledge Discovery and Data Mining (KDD), 1995.Google ScholarGoogle Scholar
  14. J. Rocchio. Relevance feedback in information retrieval,In The Smart System - experiments in automatic document processing. Prentice Hall, Englewood Cliffs, NJ, 1971.Google ScholarGoogle Scholar
  15. N. Roy and A. McCallum. Toward optimal active learning through sampling estimation of error reduction. In Proc. 18th International Conf. on Machine Learning, pages 441--448. Morgan Kaufmann, San Francisco, CA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Saar-Tsechansky and F. Provost. Active sampling for class probability estimation and ranking. Machine learning, pages 153--178, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4):133--168, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  18. H. S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Proceedings of the fifth annual workshop on Computational learning theory, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. X. Shen and C. Zhai. Active feedback in ad hoc information retrieval. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 55--66, March 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Tong and D. Koller. Support vector machine active learning with applications to text classification. In Proceedings of 17th International Conference on Machine Learning, pages 999--1006, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Z. Xu, R. Akella, and Y. Zhang. Incorporating diversity and density in active learning for relevance feedback. In 29th European Conference on Information Retrieval (ECIR), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the Tenth ACM International Conference on Information and Knowledge Management, pages 403--410, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Zhang and F. J. Oles. A probability analysis on the value of unlabeled data for classification problems. In International Conference on Machine Learning, pages 1191--1198, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Zhang, W. Xu, and J. Callan. Exploration and exploitation in adaptive filtering based on bayesian active learning. In Proceedings of 20th International Conf. on Machine Learning, pages 896--903, 2003.Google ScholarGoogle Scholar

Index Terms

  1. A bayesian logistic regression model for active relevance feedback

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
      July 2008
      934 pages
      ISBN:9781605581644
      DOI:10.1145/1390334

      Copyright © 2008 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 July 2008

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader