ABSTRACT
We first present in this paper an analytical view of heuristic retrieval constraints which yields simple tests to determine whether a retrieval function satisfies the constraints or not. We then review empirical findings on word frequency distributions and the central role played by burstiness in this context. This leads us to propose a formal definition of burstiness which can be used to characterize probability distributions wrt this phenomenon. We then introduce the family of information-based IR models which naturally captures heuristic retrieval constraints when the underlying probability distribution is bursty and propose a new IR model within this family, based on the log-logistic distribution. The experiments we conduct on three different collections illustrate the good behavior of the log-logistic IR model: it significantly outperforms the Jelinek-Mercer and Dirichlet prior language models on all three collections, with both short and long queries and for both the MAP and the precision at 10 documents. It also outperforms the InL2 DFR model for the MAP, and yields results on a par with it for the precision at 10.
- G. Amati and C. J. V. Rijsbergen. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst., 20(4):357--389, 2002. Google ScholarDigital Library
- K. W. Church and W. A. Gale. Poisson mixtures. Natural Language Engineering, 1:163--190, 1995.Google ScholarCross Ref
- S. Clinchant and É. Gaussier. The BNB distribution for text modeling. In ECIR '08: Proceedings of the 30th European Conference on Information Retrieval, 2008. Google ScholarDigital Library
- H. Fang, T. Tao, and C. Zhai. A formal study of information retrieval heuristics. In SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, 2004. Google ScholarDigital Library
- S. E. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR '94: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 232--241, New York, NY, USA, 1994. Springer-Verlag New York, Inc. Google ScholarDigital Library
- G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York, NY, USA, 1983. Google ScholarDigital Library
- C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., 22(2):179--214, 2004 Google ScholarDigital Library
Index Terms
- Retrieval constraints and word frequency distributions: a log-logistic model for IR
Recommendations
Information-based models for ad hoc IR
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrievalWe introduce in this paper the family of information-based models for ad hoc information retrieval. These models draw their inspiration from a long-standing hypothesis in IR, namely the fact that the difference in the behaviors of a word at the document ...
Retrieval constraints and word frequency distributions a log-logistic model for IR
AbstractWe first present in this paper an analytical view of heuristic retrieval constraints which yields simple tests to determine whether a retrieval function satisfies the constraints or not. We then review empirical findings on word frequency ...
Axiomatic Analysis and Optimization of Information Retrieval Models
ICTIR '13: Proceedings of the 2013 Conference on the Theory of Information RetrievalThe accuracy of a search engine is mostly determined by the optimality of the retrieval model used in the search engine. Develoing optimal retrieval models has always been a very important fundamental research problem in information retrieval because an ...
Comments