skip to main content
10.1145/1008992.1009001acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Forming test collections with no system pooling

Published:25 July 2004Publication History

ABSTRACT

Forming test collection relevance judgments from the pooled output of multiple retrieval systems has become the standard process for creating resources such as the TREC, CLEF, and NTCIR test collections. This paper presents a series of experiments examining three different ways of building test collections where no system pooling is used. First, a collection formation technique combining manual feedback and multiple systems is adapted to work with a single retrieval system. Second, an existing method based on pooling the output of multiple manual searches is re-examined: testing a wider range of searchers and retrieval systems than has been examined before. Third, a new approach is explored where the ranked output of a single automatic search on a single retrieval system is assessed for relevance: no pooling whatsoever. Using established techniques for evaluating the quality of relevance judgments, in all three cases, test collections are formed that are as good as TREC.

References

  1. Bland, J. M., Altman, D. G.(1986) Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, i, 307--310.Google ScholarGoogle ScholarCross RefCross Ref
  2. Buckley, C., Voorhees, E. M.(2004), Retrieval Evaluation with Incomplete Information, in Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cieri, C., Strassel, S., Graff, D., Martey, N., Rennert, K. and Liberman, M.(2002), Corpora for Topic Detection and Tracking, In: Allan, J.(ed.), Topic Detection and Tracking: Event-based Information Organization, 33--66, Kluwer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cormack, G. V., Palmer, C. R. and Clarke, C. L. A.(1998), Efficient Construction of Large Test Collections, in Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 282--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Fox, E. A. and Shaw, J. A.(1993), Combination of Multiple Searches, in NIST Special Publication 500-215: The 2nd Text REtrieval Conference(TREC-2), Gaithersburg, MD, 243--252.Google ScholarGoogle Scholar
  6. Garofolo, J. S., Voorhees, E. M., Stanford, V. M., Spärck Jones, K.(1997), TREC-6 1997 Spoken Document Retrieval Track Overview and Results, in Proceedings of the 6th Text REtrieval Conference(TREC 6), NIST Special Publication 500-240, 83--92.Google ScholarGoogle Scholar
  7. Gilbert, H. and Spärck Jones, K.(1979), Statistical bases of relevance assessment for the 'ideal' information retrieval test collection, British Library Research and Development Report 5481, Computer Laboratory, University of Cambridge.Google ScholarGoogle Scholar
  8. Harman, D(1996), Panel: building and using test collections, in Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval, 335--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Harmandas, V., Sanderson, M., Dunlop, M. D.(1997), Image retrieval by hypertext links, in Proceedings of the 20th annual international ACM SIGIR conference on research and development in information retrieval, 296--303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kuriyama, K., Kando, N., Nozue, T. and Eguchi, K.(2002), Pooling for a Large-Scale Test Collection: An Analysis of the Search Results from the First NTCIR Workshop, Information Retrieval, 5(1), 41--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lewis, D. D.(1992), An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task in Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, 37--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Manmatha, R., Rath, T., Feng, F.(2001): Modeling Score Distributions for Combining the Outputs of Search Engines, in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Salton, G., Fox, E. A., Wu, H.(1983): Extended Boolean Information Retrieval, in Communications of the ACM, 26(11): 1022--1036. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Sheridan, P., Wechsler, M., and Schäuble, P.(1997), Cross-Language Speech Retrieval: Establishing a Baseline Performance, in Proceedings of the 20th annual international ACM SIGIR conference on research and development in information retrieval, 99--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Soboroff, I., Nicholas, C., and Cahan, P.(2001), Ranking retrieval systems without relevance judgments, in Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, 66--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Soboroff, I. and Robertson, S.(2003), Building a filtering test collection for TREC 2002, in Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, 243--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Spärck Jones, K.,(1974), Progress in Documentation: Automatic Indexing, Journal of Documentation, 30(4), 393--432.Google ScholarGoogle ScholarCross RefCross Ref
  18. Spärck Jones, K., Van Rijsbergen, C. J.(1975), Report on the need for and provision of an 'ideal' information retrieval test collection, British Library Research and Development Report 5266, University Computer Laboratory, Cambridge.Google ScholarGoogle Scholar
  19. Spärck Jones, K., Bates, R. G.(1977), Report on a design study for the 'ideal' information retrieval test collection, British Library Research and Development Report 5428, Computer Laboratory, University of Cambridge.Google ScholarGoogle Scholar
  20. Stuart, A.(1983), Kendall's tau. In Kotz, S and Johnson, N. L., editors, Encyclopedia of Statistical Sciences, vol. 4, 367--369. John Wiley and Sons.Google ScholarGoogle Scholar
  21. Sullivan, D.(2002), The Search Engine "Perfect Page", in Search Engine Watch accessed from http://searchenginewatch.com/searchday/02/sd1104-pptest.html.Google ScholarGoogle Scholar
  22. Voorhees, E. M.(1998) Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness, in Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, 315--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Voorhees, E. M., Harman, D.(1998) Overview of the 7 th Text REtrieval Conference(TREC-7), in Proceedings of the 7th Text REtrieval Conference(TREC-7) NIST Special Publication 500-242, 1--24.Google ScholarGoogle Scholar
  24. Voorhees, E. M., Harman, D.(1999) Overview of the 8th Text REtrieval Conference(TREC-8), in Proceedings of the 8th Text REtrieval Conference(TREC-8) NIST Special Publication 500-246, 1--24.Google ScholarGoogle Scholar
  25. Voorhees, E.(2001) Evaluation by Highly Relevant Documents, in Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, 74--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Voorhees, E.(2002), Personal Communication.Google ScholarGoogle Scholar
  27. Zobel, J.(1998), How reliable are the results of large-scale information retrieval experiments? In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 307--314. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Forming test collections with no system pooling

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
      July 2004
      624 pages
      ISBN:1581138814
      DOI:10.1145/1008992

      Copyright © 2004 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 July 2004

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader