skip to main content
Skip header Section
Information Retrieval EvaluationJune 2011
Publisher:
  • Morgan & Claypool Publishers
ISBN:978-1-59829-971-7
Published:03 June 2011
Pages:
120
Skip Bibliometrics Section
Bibliometrics
Skip Abstract Section
Abstract

Evaluation has always played a major role in information retrieval, with the early pioneers such as Cyril Cleverdon and Gerard Salton laying the foundations for most of the evaluation methodologies in use today. The retrieval community has been extremely fortunate to have such a well-grounded evaluation paradigm during a period when most of the human language technologies were just developing. This lecture has the goal of explaining where these evaluation methodologies came from and how they have continued to adapt to the vastly changed environment in the search engine world today. The lecture starts with a discussion of the early evaluation of information retrieval systems, starting with the Cranfield testing in the early 1960s, continuing with the Lancaster "user" study for MEDLARS, and presenting the various test collection investigations by the SMART project and by groups in Britain. The emphasis in this chapter is on the how and the why of the various methodologies developed. The second chapter covers the more recent "batch" evaluations, examining the methodologies used in the various open evaluation campaigns such as TREC, NTCIR (emphasis on Asian languages), CLEF (emphasis on European languages), INEX (emphasis on semi-structured data), etc. Here again the focus is on the how and why, and in particular on the evolving of the older evaluation methodologies to handle new information access techniques. This includes how the test collection techniques were modified and how the metrics were changed to better reflect operational environments. The final chapters look at evaluation issues in user studies -- the interactive part of information retrieval, including a look at the search log studies mainly done by the commercial search engines. Here the goal is to show, via case studies, how the high-level issues of experimental design affect the final evaluations. Table of Contents: Introduction and Early History / "Batch" Evaluation Since 1992 / Interactive Evaluation / Conclusion

Cited By

  1. ACM
    Zerhoudi S, Günther S, Plassmeier K, Borst T, Seifert C, Hagen M and Granitzer M The SimIIR 2.0 Framework Proceedings of the 31st ACM International Conference on Information & Knowledge Management, (4661-4666)
  2. ACM
    Livraga G, Motta A and Viviani M Assessing User Privacy on Social Media: The Twitter Case Study Proceedings of the 2022 Workshop on Open Challenges in Online Social Networks, (1-9)
  3. Ruthven I (2021). Resonance and the experience of relevance, Journal of the Association for Information Science and Technology, 72:5, (554-569), Online publication date: 10-Apr-2021.
  4. ACM
    Dosso D and Silvello G A Scalable Virtual Document-Based Keyword Search System for RDF Datasets Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, (965-968)
  5. ACM
    Damessie T, Culpepper J, Kim J and Scholer F Presentation Ordering Effects On Assessor Agreement Proceedings of the 27th ACM International Conference on Information and Knowledge Management, (723-732)
  6. ACM
    Kutlu M, Elsayed T, Hasanain M and Lease M When Rank Order Isn't Enough Proceedings of the 27th ACM International Conference on Information and Knowledge Management, (397-406)
  7. ACM
    Zhang H, Abualsaud M, Ghelani N, Smucker M, Cormack G and Grossman M Effective User Interaction for High-Recall Retrieval Proceedings of the 27th ACM International Conference on Information and Knowledge Management, (187-196)
  8. ACM
    Van Gysel C and de Rijke M Pytrec_eval The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, (873-876)
  9. Angelini M, Ferro N, Santucci G and Silvello G Visual analytics for information retrieval evaluation campaigns Proceedings of the EuroVis Workshop on Visual Analytics, (25-29)
  10. Zahedi Z, Costas R and Wouters P (2017). Mendeley readership as a filtering tool to identify highly cited publications, Journal of the Association for Information Science and Technology, 68:10, (2511-2521), Online publication date: 1-Oct-2017.
  11. ACM
    Zhang Y, Liu X and Zhai C Information Retrieval Evaluation as Search Simulation Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, (193-200)
  12. ACM
    Sequiera R and Lin J Finally, a Downloadable Test Collection of Tweets Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, (1225-1228)
  13. ACM
    Roy D An Improved Test Collection and Baselines for Bibliographic Citation Recommendation Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, (2271-2274)
  14. Bergamaschi S, Ferro N, Guerra F and Silvello G Keyword-Based Search Over Databases Transactions on Computational Collective Intelligence XXI - Volume 9630, (1-20)
  15. Ferro N, Silvello G, Keskustalo H, Pirkola A and Järvelin K (2016). The twist measure for IR evaluation, Journal of the Association for Information Science and Technology, 67:3, (620-648), Online publication date: 1-Mar-2016.
  16. ACM
    Paik J and Lin J Retrievability in API-Based "Evaluation as a Service" Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval, (91-94)
  17. ACM
    Carterette B, Clough P, Hall M, Kanoulas E and Sanderson M Evaluating Retrieval over Sessions Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, (685-688)
  18. ACM
    Baruah G, Zhang H, Guttikonda R, Lin J, Smucker M and Vechtomova O Optimizing Nugget Annotations with Active Learning Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, (2359-2364)
  19. Pääkkönen T, Järvelin K, Kekäläinen J, Keskustalo H, Baskaya F, Maxwell D and Azzopardi L Exploring Behavioral Dimensions in Session Effectiveness Proceedings of the 6th International Conference on Experimental IR Meets Multilinguality, Multimodality, and Interaction - Volume 9283, (178-189)
  20. ACM
    Ferro N (2014). CLEF 15th Birthday, ACM SIGIR Forum, 48:2, (31-55), Online publication date: 23-Dec-2014.
  21. ACM
    Agosti M, Fuhr N, Toms E and Vakkari P (2014). Evaluation methodologies in information retrieval dagstuhl seminar 13441, ACM SIGIR Forum, 48:1, (36-41), Online publication date: 26-Jun-2014.
  22. ACM
    Stocker A, Zoier M, Softic S, Paschke S, Bischofter H and Kern R Is enterprise search useful at all? Proceedings of the 14th International Conference on Knowledge Technologies and Data-driven Business, (1-8)
  23. ACM
    Ferrante M, Ferro N and Maistro M Injecting user models and time into precision via Markov chains Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, (597-606)
  24. ACM
    Voorhees E, Lin J and Efron M On run diversity in Evaluation as a Service Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, (959-962)
  25. ACM
    Lin J and Efron M Infrastructure support for evaluation as a service Proceedings of the 23rd International Conference on World Wide Web, (79-82)
  26. Clough P and Goodale P Selecting Success Criteria Proceedings of the 4th International Conference on Information Access Evaluation. Multilinguality, Multimodality, and Visualization - Volume 8138, (59-70)
  27. Lupu M and Hanbury A (2013). Patent Retrieval, Foundations and Trends in Information Retrieval, 7:1, (1-97), Online publication date: 20-Feb-2013.
  28. ACM
    Lin J and Efron M (2013). Evaluation as a service for information retrieval, ACM SIGIR Forum, 47:2, (8-14), Online publication date: 21-Dec-2013.
  29. ACM
    Bergamaschi S, Ferro N, Guerra F and Silvello G Keyword search and evaluation over relational databases Proceedings of the 7th International Workshop on Ranking in Databases, (1-3)
  30. Agosti M, Di Buccio E, Ferro N, Masiero I, Peruzzo S and Silvello G DIRECTions Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics, (88-99)
  31. Angelini M, Ferro N, Järvelin K, Keskustalo H, Pirkola A, Santucci G and Silvello G Cumulated relative position Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics, (112-123)
  32. Järvelin K User-Oriented evaluation in IR Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization, (86-91)
  33. Harman D TREC-Style evaluations Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization, (97-115)
  34. ACM
    Agosti M, Berendsen R, Bogers T, Braschler M, Buitelaar P, Choukri K, Maria Di Nunzio G, Ferro N, Forner P, Hanbury A, Heppin K, Hansen P, Järvelin A, Larsen B, Lupu M, Masiero I, Müller H, Peruzzo S, Petras V, Piroi F, de Rijke M, Santucci G, Silvello G, Toms E, Berendsen R, Hanbury A, Lupu M, Petras V and Silvello G (2012). PROMISE retreat report prospects and opportunities for information access evaluation, ACM SIGIR Forum, 46:2, (60-84), Online publication date: 21-Dec-2012.
  35. ACM
    Agosti M, Ferro N and Thanos C (2012). DESIRE 2011, ACM SIGIR Forum, 46:1, (51-55), Online publication date: 20-May-2012.
  36. ACM
    Järvelin K (2012). IR research, ACM SIGIR Forum, 45:2, (17-31), Online publication date: 9-Jan-2012.
  37. ACM
    Giner F Information Retrieval Evaluation Measures Defined on Some Axiomatic Models of Preferences, ACM Transactions on Information Systems, 0:0
Contributors
  • National Institute of Standards and Technology

Recommendations