research-article

A bayesian logistic regression model for active relevance feedback

Authors:
Zuobing Xu

University of California, Santa Cruz, Santa Cruz, CA, USA

University of California, Santa Cruz, Santa Cruz, CA, USA
View Profile

,
Ram Akella

University of California, Santa Cruz, Santa Cruz, CA, USA

University of California, Santa Cruz, Santa Cruz, CA, USA
View Profile

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalJuly 2008Pages 227–234https://doi.org/10.1145/1390334.1390375

Published:20 July 2008Publication History

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Pages 227–234

ABSTRACT

Relevance feedback, which traditionally uses the terms in the relevant documents to enrich the user's initial query, is an effective method for improving retrieval performance. The traditional relevance feedback algorithms lead to overfitting because of the limited amount of training data and large term space. This paper introduces an online Bayesian logistic regression algorithm to incorporate relevance feedback information. The new approach addresses the overfitting problem by projecting the original feature space onto a more compact set which retains the necessary information. The new set of features consist of the original retrieval score, the distance to the relevant documents and the distance to non-relevant documents. To reduce the human evaluation effort in ascertaining relevance, we introduce a new active learning algorithm based on variance reduction to actively select documents for user evaluation. The new active learning algorithm aims to select feedback documents to reduce the model variance. The variance reduction approach leads to capturing relevance, diversity and uncertainty of the unlabeled documents in a principled manner. These are the critical factors of active learning indicated in previous literature. Experiments with several TREC datasets demonstrate the effectiveness of the proposed approach.

References

The lemur toolkit. http://www.lemurproject.org.Google Scholar
D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. In Advances in Neural Information Processing Systems, volume 7, pages 705--712. The MIT Press, 1995.Google Scholar
L. D. and G. W. Training text classifiers by uncertainty sampling. In International ACM Conf. on Research and Development in Information Retrieval, 1994.Google Scholar
A. Dayanik, D. D. Lewis, D. Madigan, V. Menkov, and A. Genkin. Constructing information prior distributions from domain knowledge in text classfication. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, 2006. Google ScholarDigital Library
Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28(2-3):133--168, 1997. Google ScholarDigital Library
A. Genkin, D. Lewis, and D. Madigan. large-scale bayesian logistic regression for text categorization. Technical report, DIMACS, 2004.Google Scholar
D. Harman. Relevance feedback revisited. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1--10, 1992. Google ScholarDigital Library
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data mining, Inference and Prediction. Spinger, 2001.Google Scholar
S. Hoi, R. Jin, J. Zhu, and M. Lyu. Batch mode active learning and its application to medical image classification. In proceedings of the 23rd international conference on machine learning, 2006. Google ScholarDigital Library
S. C. H. Hoi, R. Jin, and M. R. Lyu. Large scale text categorization by batch mode active learning. In Proceedings of International World Wide Web Conference, 2006. Google ScholarDigital Library
R. Kass, L. Tierney, and J. Kadane. validity of posterior expansions based on laplace's method. Bayesian and likelihood methods in statistics and econometrics, 1990.Google Scholar
S. Kay. Fundamentals of statistical signal processing. Prentice-Hall, 1993.Google ScholarDigital Library
R. Kohavi and D. Sommerfield. Feature subset selection using the wrapper method: Overfitting and dynamic search space topology. In The First International Conference on Knowledge Discovery and Data Mining (KDD), 1995.Google Scholar
J. Rocchio. Relevance feedback in information retrieval,In The Smart System - experiments in automatic document processing. Prentice Hall, Englewood Cliffs, NJ, 1971.Google Scholar
N. Roy and A. McCallum. Toward optimal active learning through sampling estimation of error reduction. In Proc. 18th International Conf. on Machine Learning, pages 441--448. Morgan Kaufmann, San Francisco, CA, 2001. Google ScholarDigital Library
M. Saar-Tsechansky and F. Provost. Active sampling for class probability estimation and ranking. Machine learning, pages 153--178, 2004. Google ScholarDigital Library
G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4):133--168, 1990.Google ScholarCross Ref
H. S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Proceedings of the fifth annual workshop on Computational learning theory, 1992. Google ScholarDigital Library
X. Shen and C. Zhai. Active feedback in ad hoc information retrieval. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 55--66, March 2005. Google ScholarDigital Library
R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998. Google ScholarDigital Library
S. Tong and D. Koller. Support vector machine active learning with applications to text classification. In Proceedings of 17th International Conference on Machine Learning, pages 999--1006, 2000. Google ScholarDigital Library
Z. Xu, R. Akella, and Y. Zhang. Incorporating diversity and density in active learning for relevance feedback. In 29th European Conference on Information Retrieval (ECIR), 2007. Google ScholarDigital Library
C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the Tenth ACM International Conference on Information and Knowledge Management, pages 403--410, 2001. Google ScholarDigital Library
T. Zhang and F. J. Oles. A probability analysis on the value of unlabeled data for classification problems. In International Conference on Machine Learning, pages 1191--1198, 2000. Google ScholarDigital Library
Y. Zhang, W. Xu, and J. Callan. Exploration and exploitation in adaptive filtering based on bayesian active learning. In Proceedings of 20th International Conf. on Machine Learning, pages 896--903, 2003.Google Scholar

Index Terms

A bayesian logistic regression model for active relevance feedback
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Relevance assessment

Recommendations

Active relevance feedback for difficult queries
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

Relevance feedback has been demonstrated to be an effective strategy for improving retrieval accuracy. The existing relevance feedback algorithms based on language models and vector space models are not effective in learning from negative feedback ...
Read More
A novel relevance feedback method for CBIR

In this paper, we address the challenge about insufficiency of training set and limited feedback information in each relevance feedback (RF) round during the process of content based image retrieval (CBIR). We propose a novel active learning scheme to ...
Read More
Relevance feedback for content-based image retrieval using Bayesian network
VIP '05: Proceedings of the Pan-Sydney area workshop on Visual information processing

Relevance feedback is a powerful query modification technique in the field of content-based image retrieval. The key issue in relevance feedback is how to effectively utilize the feedback information to improve the retrieval performance. This paper ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
July 2008
934 pages
ISBN:9781605581644
DOI:10.1145/1390334
General Chairs:
Tat-Seng Chua
National University of Singapore
,
Mun-Kew Leong
National Library Board, Singapore
,
Program Chairs:
Syung Hyon Myaeng
Information and Communications University, Korea
,
Douglas W. Oard
University of Maryland, College Park, USA
,
Fabrizio Sebastiani
Consiglio Nazionale delle Ricerche, Italy
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 July 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
active learning
bayesian logistic regression
relevance feedback
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 28
  Total Citations
  View Citations
- 1,176
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A bayesian logistic regression model for active relevance feedback

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Active relevance feedback for difficult queries

A novel relevance feedback method for CBIR

Relevance feedback for content-based image retrieval using Bayesian network