skip to main content
10.1145/253168.253222acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article
Free Access

Probabilistic models for document retrieval: a comparison of perfromance on exterimental and synthetic data bases

Published:01 September 1986Publication History

ABSTRACT

Probabilistic document retrieval systems consistent with the two Poisson independence model outperforms the binary independence model if the terms are distributed as described by the model's assumptions. The Two Poisson Effectiveness Hypothesis suggests that retrieval models based upon the two Poisson model will outperform binary independent models when used on a “real-world” database, where independence and two Poisson term occurrence distributions fail to hold, because the added information obtained from incorporating term frequency information will more than compensate for the non-Poisson distributions of terms. Searches of the MED1033 database suggest that if terms are not independent and frequencies of term occurrence are not distributed in a two Poisson manner, the binary independence sequential retrieval model outperforms the two Poisson independence retrieval model.

References

  1. Bookstein, A. and Swanson, D. "A Decision Theoretic Foundation for Indexing." Journal of the American Society for Information Science. XXVI (January 1975): 45-50.Google ScholarGoogle ScholarCross RefCross Ref
  2. Booketein# A. "Information Retrieval: A Seguential Learning Process." Journal of the American Society for Information Science. XXXIV (September 1983): 331-342.Google ScholarGoogle ScholarCross RefCross Ref
  3. Bratley, P., Fox, B., and Schrage, L. A Guide to Simulation. (New York: Springer-Verlag, ~ 983) : Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Croft, W. and Harper, D. "Using Probabillstlc Models of Document Retrieval without Relevance Information." Journal of Documentation~ XXXV (December 1979): 285-295.Google ScholarGoogle ScholarCross RefCross Ref
  5. FOx, E. Characterization of Two New Experimental Collections in Computerand Information Science Containing Textual and Bibliographic #. Technical Report 83-561, Cornell Eniverslty Department of Computer Science. Ithaca, New York: September, 1983.Google ScholarGoogle Scholar
  6. Hatter, S. "A Probabilistic Approach to Keyword Indexing." Ph.D. dissertation, University of Chicago, 1974.Google ScholarGoogle Scholar
  7. Losee, R. "The Performance of Probabillstic Models of Document Retrieval Systems." Ph.D. dissertation, University of Chicago, 1986.Google ScholarGoogle Scholar
  8. Raghaven, V., Shi, H. and Yu, C. "Evaluation of the 2 Poisson Model as a Basis for using Term Frequency Data in Searching." Proceedlngsof the Sixth Annual International AC# SIGIR Conference on Research and Development in Information and Retrieval. (New York: Association for Computin9 Machinery, 1983). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D11man, J. Princlples of Database Systems, second edition. (Rockville, Maryland: Computer Science Press, 1982).Google ScholarGoogle Scholar
  10. Van Rijsbergen, C. Information Retrieval, second edition. (London: Butterworths, 1979). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Voorhees, E. Computer Science Department, Cornell University, Ithaca, New YOrk. Letter of 18 June, 1984 and persona# co~#aunication of 19 June, 1985.Google ScholarGoogle Scholar
  1. Probabilistic models for document retrieval: a comparison of perfromance on exterimental and synthetic data bases

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '86: Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
      September 1986
      283 pages
      ISBN:0897911873
      DOI:10.1145/253168

      Copyright © 1986 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 September 1986

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader