ABSTRACT
Relevance feedback, which traditionally uses the terms in the relevant documents to enrich the user's initial query, is an effective method for improving retrieval performance. The traditional relevance feedback algorithms lead to overfitting because of the limited amount of training data and large term space. This paper introduces an online Bayesian logistic regression algorithm to incorporate relevance feedback information. The new approach addresses the overfitting problem by projecting the original feature space onto a more compact set which retains the necessary information. The new set of features consist of the original retrieval score, the distance to the relevant documents and the distance to non-relevant documents. To reduce the human evaluation effort in ascertaining relevance, we introduce a new active learning algorithm based on variance reduction to actively select documents for user evaluation. The new active learning algorithm aims to select feedback documents to reduce the model variance. The variance reduction approach leads to capturing relevance, diversity and uncertainty of the unlabeled documents in a principled manner. These are the critical factors of active learning indicated in previous literature. Experiments with several TREC datasets demonstrate the effectiveness of the proposed approach.
- The lemur toolkit. http://www.lemurproject.org.Google Scholar
- D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. In Advances in Neural Information Processing Systems, volume 7, pages 705--712. The MIT Press, 1995.Google Scholar
- L. D. and G. W. Training text classifiers by uncertainty sampling. In International ACM Conf. on Research and Development in Information Retrieval, 1994.Google Scholar
- A. Dayanik, D. D. Lewis, D. Madigan, V. Menkov, and A. Genkin. Constructing information prior distributions from domain knowledge in text classfication. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, 2006. Google ScholarDigital Library
- Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28(2-3):133--168, 1997. Google ScholarDigital Library
- A. Genkin, D. Lewis, and D. Madigan. large-scale bayesian logistic regression for text categorization. Technical report, DIMACS, 2004.Google Scholar
- D. Harman. Relevance feedback revisited. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1--10, 1992. Google ScholarDigital Library
- T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data mining, Inference and Prediction. Spinger, 2001.Google Scholar
- S. Hoi, R. Jin, J. Zhu, and M. Lyu. Batch mode active learning and its application to medical image classification. In proceedings of the 23rd international conference on machine learning, 2006. Google ScholarDigital Library
- S. C. H. Hoi, R. Jin, and M. R. Lyu. Large scale text categorization by batch mode active learning. In Proceedings of International World Wide Web Conference, 2006. Google ScholarDigital Library
- R. Kass, L. Tierney, and J. Kadane. validity of posterior expansions based on laplace's method. Bayesian and likelihood methods in statistics and econometrics, 1990.Google Scholar
- S. Kay. Fundamentals of statistical signal processing. Prentice-Hall, 1993.Google ScholarDigital Library
- R. Kohavi and D. Sommerfield. Feature subset selection using the wrapper method: Overfitting and dynamic search space topology. In The First International Conference on Knowledge Discovery and Data Mining (KDD), 1995.Google Scholar
- J. Rocchio. Relevance feedback in information retrieval,In The Smart System - experiments in automatic document processing. Prentice Hall, Englewood Cliffs, NJ, 1971.Google Scholar
- N. Roy and A. McCallum. Toward optimal active learning through sampling estimation of error reduction. In Proc. 18th International Conf. on Machine Learning, pages 441--448. Morgan Kaufmann, San Francisco, CA, 2001. Google ScholarDigital Library
- M. Saar-Tsechansky and F. Provost. Active sampling for class probability estimation and ranking. Machine learning, pages 153--178, 2004. Google ScholarDigital Library
- G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4):133--168, 1990.Google ScholarCross Ref
- H. S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Proceedings of the fifth annual workshop on Computational learning theory, 1992. Google ScholarDigital Library
- X. Shen and C. Zhai. Active feedback in ad hoc information retrieval. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 55--66, March 2005. Google ScholarDigital Library
- R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998. Google ScholarDigital Library
- S. Tong and D. Koller. Support vector machine active learning with applications to text classification. In Proceedings of 17th International Conference on Machine Learning, pages 999--1006, 2000. Google ScholarDigital Library
- Z. Xu, R. Akella, and Y. Zhang. Incorporating diversity and density in active learning for relevance feedback. In 29th European Conference on Information Retrieval (ECIR), 2007. Google ScholarDigital Library
- C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the Tenth ACM International Conference on Information and Knowledge Management, pages 403--410, 2001. Google ScholarDigital Library
- T. Zhang and F. J. Oles. A probability analysis on the value of unlabeled data for classification problems. In International Conference on Machine Learning, pages 1191--1198, 2000. Google ScholarDigital Library
- Y. Zhang, W. Xu, and J. Callan. Exploration and exploitation in adaptive filtering based on bayesian active learning. In Proceedings of 20th International Conf. on Machine Learning, pages 896--903, 2003.Google Scholar
Index Terms
- A bayesian logistic regression model for active relevance feedback
Recommendations
Active relevance feedback for difficult queries
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge managementRelevance feedback has been demonstrated to be an effective strategy for improving retrieval accuracy. The existing relevance feedback algorithms based on language models and vector space models are not effective in learning from negative feedback ...
A novel relevance feedback method for CBIR
In this paper, we address the challenge about insufficiency of training set and limited feedback information in each relevance feedback (RF) round during the process of content based image retrieval (CBIR). We propose a novel active learning scheme to ...
Relevance feedback for content-based image retrieval using Bayesian network
VIP '05: Proceedings of the Pan-Sydney area workshop on Visual information processingRelevance feedback is a powerful query modification technique in the field of content-based image retrieval. The key issue in relevance feedback is how to effectively utilize the feedback information to improve the retrieval performance. This paper ...
Comments