Evaluation has always played a major role in information retrieval, with the early pioneers such as Cyril Cleverdon and Gerard Salton laying the foundations for most of the evaluation methodologies in use today. The retrieval community has been extremely fortunate to have such a well-grounded evaluation paradigm during a period when most of the human language technologies were just developing. This lecture has the goal of explaining where these evaluation methodologies came from and how they have continued to adapt to the vastly changed environment in the search engine world today. The lecture starts with a discussion of the early evaluation of information retrieval systems, starting with the Cranfield testing in the early 1960s, continuing with the Lancaster "user" study for MEDLARS, and presenting the various test collection investigations by the SMART project and by groups in Britain. The emphasis in this chapter is on the how and the why of the various methodologies developed. The second chapter covers the more recent "batch" evaluations, examining the methodologies used in the various open evaluation campaigns such as TREC, NTCIR (emphasis on Asian languages), CLEF (emphasis on European languages), INEX (emphasis on semi-structured data), etc. Here again the focus is on the how and why, and in particular on the evolving of the older evaluation methodologies to handle new information access techniques. This includes how the test collection techniques were modified and how the metrics were changed to better reflect operational environments. The final chapters look at evaluation issues in user studies -- the interactive part of information retrieval, including a look at the search log studies mainly done by the commercial search engines. Here the goal is to show, via case studies, how the high-level issues of experimental design affect the final evaluations. Table of Contents: Introduction and Early History / "Batch" Evaluation Since 1992 / Interactive Evaluation / Conclusion
Cited By
- Zerhoudi S, Günther S, Plassmeier K, Borst T, Seifert C, Hagen M and Granitzer M The SimIIR 2.0 Framework Proceedings of the 31st ACM International Conference on Information & Knowledge Management, (4661-4666)
- Livraga G, Motta A and Viviani M Assessing User Privacy on Social Media: The Twitter Case Study Proceedings of the 2022 Workshop on Open Challenges in Online Social Networks, (1-9)
- Ruthven I (2021). Resonance and the experience of relevance, Journal of the Association for Information Science and Technology, 72:5, (554-569), Online publication date: 10-Apr-2021.
- Dosso D and Silvello G A Scalable Virtual Document-Based Keyword Search System for RDF Datasets Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, (965-968)
- Damessie T, Culpepper J, Kim J and Scholer F Presentation Ordering Effects On Assessor Agreement Proceedings of the 27th ACM International Conference on Information and Knowledge Management, (723-732)
- Kutlu M, Elsayed T, Hasanain M and Lease M When Rank Order Isn't Enough Proceedings of the 27th ACM International Conference on Information and Knowledge Management, (397-406)
- Zhang H, Abualsaud M, Ghelani N, Smucker M, Cormack G and Grossman M Effective User Interaction for High-Recall Retrieval Proceedings of the 27th ACM International Conference on Information and Knowledge Management, (187-196)
- Van Gysel C and de Rijke M Pytrec_eval The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, (873-876)
- Angelini M, Ferro N, Santucci G and Silvello G Visual analytics for information retrieval evaluation campaigns Proceedings of the EuroVis Workshop on Visual Analytics, (25-29)
- Zahedi Z, Costas R and Wouters P (2017). Mendeley readership as a filtering tool to identify highly cited publications, Journal of the Association for Information Science and Technology, 68:10, (2511-2521), Online publication date: 1-Oct-2017.
- Zhang Y, Liu X and Zhai C Information Retrieval Evaluation as Search Simulation Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, (193-200)
- Sequiera R and Lin J Finally, a Downloadable Test Collection of Tweets Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, (1225-1228)
- Roy D An Improved Test Collection and Baselines for Bibliographic Citation Recommendation Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, (2271-2274)
- Bergamaschi S, Ferro N, Guerra F and Silvello G Keyword-Based Search Over Databases Transactions on Computational Collective Intelligence XXI - Volume 9630, (1-20)
- Ferro N, Silvello G, Keskustalo H, Pirkola A and Järvelin K (2016). The twist measure for IR evaluation, Journal of the Association for Information Science and Technology, 67:3, (620-648), Online publication date: 1-Mar-2016.
- Paik J and Lin J Retrievability in API-Based "Evaluation as a Service" Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval, (91-94)
- Carterette B, Clough P, Hall M, Kanoulas E and Sanderson M Evaluating Retrieval over Sessions Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, (685-688)
- Baruah G, Zhang H, Guttikonda R, Lin J, Smucker M and Vechtomova O Optimizing Nugget Annotations with Active Learning Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, (2359-2364)
- Pääkkönen T, Järvelin K, Kekäläinen J, Keskustalo H, Baskaya F, Maxwell D and Azzopardi L Exploring Behavioral Dimensions in Session Effectiveness Proceedings of the 6th International Conference on Experimental IR Meets Multilinguality, Multimodality, and Interaction - Volume 9283, (178-189)
- Ferro N (2014). CLEF 15th Birthday, ACM SIGIR Forum, 48:2, (31-55), Online publication date: 23-Dec-2014.
- Agosti M, Fuhr N, Toms E and Vakkari P (2014). Evaluation methodologies in information retrieval dagstuhl seminar 13441, ACM SIGIR Forum, 48:1, (36-41), Online publication date: 26-Jun-2014.
- Stocker A, Zoier M, Softic S, Paschke S, Bischofter H and Kern R Is enterprise search useful at all? Proceedings of the 14th International Conference on Knowledge Technologies and Data-driven Business, (1-8)
- Ferrante M, Ferro N and Maistro M Injecting user models and time into precision via Markov chains Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, (597-606)
- Voorhees E, Lin J and Efron M On run diversity in Evaluation as a Service Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, (959-962)
- Lin J and Efron M Infrastructure support for evaluation as a service Proceedings of the 23rd International Conference on World Wide Web, (79-82)
- Clough P and Goodale P Selecting Success Criteria Proceedings of the 4th International Conference on Information Access Evaluation. Multilinguality, Multimodality, and Visualization - Volume 8138, (59-70)
- Lupu M and Hanbury A (2013). Patent Retrieval, Foundations and Trends in Information Retrieval, 7:1, (1-97), Online publication date: 20-Feb-2013.
- Lin J and Efron M (2013). Evaluation as a service for information retrieval, ACM SIGIR Forum, 47:2, (8-14), Online publication date: 21-Dec-2013.
- Bergamaschi S, Ferro N, Guerra F and Silvello G Keyword search and evaluation over relational databases Proceedings of the 7th International Workshop on Ranking in Databases, (1-3)
- Agosti M, Di Buccio E, Ferro N, Masiero I, Peruzzo S and Silvello G DIRECTions Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics, (88-99)
- Angelini M, Ferro N, Järvelin K, Keskustalo H, Pirkola A, Santucci G and Silvello G Cumulated relative position Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics, (112-123)
- Järvelin K User-Oriented evaluation in IR Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization, (86-91)
- Harman D TREC-Style evaluations Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization, (97-115)
- Agosti M, Berendsen R, Bogers T, Braschler M, Buitelaar P, Choukri K, Maria Di Nunzio G, Ferro N, Forner P, Hanbury A, Heppin K, Hansen P, Järvelin A, Larsen B, Lupu M, Masiero I, Müller H, Peruzzo S, Petras V, Piroi F, de Rijke M, Santucci G, Silvello G, Toms E, Berendsen R, Hanbury A, Lupu M, Petras V and Silvello G (2012). PROMISE retreat report prospects and opportunities for information access evaluation, ACM SIGIR Forum, 46:2, (60-84), Online publication date: 21-Dec-2012.
- Agosti M, Ferro N and Thanos C (2012). DESIRE 2011, ACM SIGIR Forum, 46:1, (51-55), Online publication date: 20-May-2012.
- Järvelin K (2012). IR research, ACM SIGIR Forum, 45:2, (17-31), Online publication date: 9-Jan-2012.
- Giner F Information Retrieval Evaluation Measures Defined on Some Axiomatic Models of Preferences, ACM Transactions on Information Systems, 0:0
Recommendations
Current Status of the Evaluation of Information Retrieval
This is the second in the series of the articles on an application of the systems analytic approach to evaluation of information retrieval (IR). In the previous article a historical overview of IR was presented and existing terminological problems ...
User-oriented evaluation methods for information retrieval: a case study based on conceptual models for query expansion
Exploring artificial intelligence in the new millenniumThis chapter discusses evaluation methods based on the use of nondichotomous relevance judgments in information retrieval (IR) experiments. It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant ...
Evaluation of contextual information retrieval effectiveness: overview of issues and research
The increasing prominence of information arising from a wide range of sources delivered over electronic media has made traditional information retrieval systems less effective. Indeed, users are overwhelmed by the information delivered by such systems in ...