Abstract
The PROMISE network of excellence organized a two-days brainstorming workshop on 30th and 31st May 2012 in Padua, Italy, to discuss and envisage future directions and perspectives for the evaluation of information access and retrieval systems in multiple languages and multiple media. This document reports on the outcomes of this event and provides details about the six envisaged research lines: search applications; contextual evaluation; challenges in test collection design and exploitation; component-based evaluation; ongoing evaluation; and signal-aware evaluation. The ultimate goal of the PROMISE retreat is to stimulate and involve the research community along these research lines and to provide funding agencies with effective and scientifically sound ideas for coordinating and supporting information access research.
- M. Agosti, R. Berendsen, T. Bogers, M. Braschler, P. Buitelaar, K. Choukri, G. M. Di Nunzio, N. Ferro, P. Forner, A. Hanbury, K. Friberg Heppin, P. Hansen, A. Järvelin, B. Larsen, M. Lupu, I. Masiero, H. Müller, S. Peruzzo, V. Petras, F. Piroi, M. de Rijke, G. Santucci, G. Silvello, and E. Toms. PROMISE Retreat Report -- Prospects and Opportunities for Information Access Evaluation. PROMISE network of excellence, ISBN 978-88-6321-039-2, http://www.promise-noe.eu/promise-retreat-report-2012/, September 2012Google Scholar
- M. Agosti, M. Braschler, E. Di Buccio, M. Dussin, N. Ferro, G. L. Granato, I. Masiero, E. Pianta, G. Santucci, G. Silvello, and G. Tino. Deliverable D3.2 -- Specification of the evaluation infrastructure based on user requirements. PROMISE Network of Excellence, EU 7FP, Contract N. 258191. http://www.promise-noe.eu/documents/10156/fdf43394-0997-4638-9f99-38b2e9c63802, August 2011.Google Scholar
- M. Agosti, E. Di Buccio, N. Ferro, I. Masiero, M. Nicchio, S. Peruzzo, and G. Silvello. Deliverable D3.3 -- Prototype of the Evaluation Infrastructure. PROMISE Network of Excellence, EU 7FP, Contract N. 258191. http://www.promise-noe.eu/documents/10156/3783730a-bce3-481b-83df-48e209c6286a, September 2012.Google Scholar
- M. Agosti, E. Di Buccio, N. Ferro, I. Masiero, S. Peruzzo, and G. Silvello. DIRECTions: Design and Specication of an IR Evaluation Infrastructure. In Catarci et al. {24}. Google ScholarDigital Library
- M. Agosti, G. M. Di Nunzio, and N. Ferro. A Proposal to Extend and Enrich the Scientific Data Curation of Evaluation Campaigns. In T. Sakay, M. Sanderson, and D. K. Evans, editors, Proc. 1st International Workshop on Evaluating Information Access (EVIA 2007), pages 62--73. National Institute of Informatics, Tokyo, Japan, 2007.Google Scholar
- M. Agosti, G. M. Di Nunzio, and N. Ferro. The Importance of Scientific Data Curation for Evaluation Campaigns. In C. Thanos, F. Borri, and L. Candela, editors, Digital Libraries: Research and Development. First International DELOS Conference. Revised Selected Papers, pages 157--166. Lecture Notes in Computer Science (LNCS) 4877, Springer, Heidelberg, Germany, 2007. Google ScholarDigital Library
- M. Agosti and N. Ferro. Towards an Evaluation Infrastructure for DL Performance Evaluation. In G. Tsakonas and C. Papatheodorou, editors, Evaluation of Digital Libraries: An insight into useful applications and methods, pages 93--120. Chandos Publishing, Oxford, UK, 2009.Google Scholar
- M. Agosti, N. Ferro, C. Peters, M. de Rijke, and A. Smeaton, editors. Multilingual and Multimodal Information Access Evaluation. Proceedings of the International Conference of the Cross-Language Evaluation Forum (CLEF 2010). Lecture Notes in Computer Science (LNCS) 6360, Springer, Heidelberg, Germany, 2010. Google ScholarDigital Library
- M. Agosti, N. Ferro, and C. Thanos. DESIRE 2011Workshop on Data infrastructurEs for Supporting Information Retrieval Evaluation. SIGIR Forum, 46(1):51--55, June 2012. Google ScholarDigital Library
- J. Allan, J. Aslam, L. Azzopardi, N. Belkin, P. Borlund, P. Bruza, J. Callan, C. Carman, M. Clarke, N. Craswell, W. B. Croft, J. S. Culpepper, F. Diaz, S. Dumais, N. Ferro, S. Geva, J. Gonzalo, D. Hawking, K. Järvelin, G. Jones, R. Jones, J. Kamps, N. Kando, E. Kanoulos, J. Karlgren, D. Kelly, M. Lease, J. Lin, S. Mizzaro, A. Moffat, V. Murdock, D. W. Oard, M. de Rijke, T. Sakai, M. Sanderson, F. Scholer, L. Si, J. Thom, P. Thomas, A. Trotman, A. Turpin, A. P. de Vries, W. Webber, X. Zhang, and Y. Zhang. Frontiers, Challenges, and Opportunities for Information Retrieval -- Report from SWIRL 2012, The Second Strategic Workshop on Information Retrieval in Lorne, February 2012. SIGIR Forum, 46(1):2--32, June 2012 Google ScholarDigital Library
- M. Angelini, N. Ferro, G. L. Granato, and G. Santucci. Deliverable D5.3 -- Collaborative User Interface Prototype with Annotation Functionalities. PROMISE Network of Excellence, EU 7FP, Contract N. 258191. http://www.promise-noe.eu/documents/10156/8c475e6c-36b5-4822-9fbc-d7d116b3a897, September 2012.Google Scholar
- M. Angelini, N. Ferro, G. Santucci, and G. Silvello. Visual Interactive Failure Analysis: Supporting Users in Information Retrieval Evaluation. In J. Kamps, W. Kraaij, and N. Fuhr, editors, Proc. 4th Symposium on Information Interaction in Context (IIiX 2012), pages 195--203. ACM Press, New York, USA, 2012. Google ScholarDigital Library
- T. G. Armstrong, A. Moffat, W. Webber, and J. Zobel. Improvements that don't add up: ad-hoc retrieval results since 1998. In D. W.-L. Cheung, I.-Y. Song, W. W. Chu, X. Hu, and J. J. Lin, editors, Proc. 18th International Conference on Information and Knowledge Management (CIKM 2009), pages 601--610. ACM Press, New York, USA, 2009. Google ScholarDigital Library
- N. Asadi, D. Metzler, T. Elsayed, and J. Lin. Pseudo test collections for learning web search ranking functions. In W.-Y. Ma, J.-Y. Nie, R. Baeza-Yaetes, T.-S. Chua, and W. B. Croft, editors, Proc. 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2011), pages 1073--1082. ACM, ACM Press, New York, USA, 2011. Google ScholarDigital Library
- L. Azzopardi, M. de Rijke, and K. Balog. Building simulated queries for known-item topics: an analysis using six european languages. In W. Kraaij, A. P. de Vries, C. L. A. Clarke, N. Fuhr, and N. Kando, editors, Proc. 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2007), pages 455--462. ACM Press, New York, USA, 2007. Google ScholarDigital Library
- S.M. Beitzel, E.C. Jensen, A. Chowdhury, and D. Grossman. Using titles and category names from editor-driven taxonomies for automatic evaluation. In D. Kraft, O. Frieder, J. Hammer, S. Qureshi, and L. Seligman, editors, Proc. 12th International Conference on Information and Knowledge Management (CIKM 2003), pages 17--23. ACM Press, New York, USA, 2003. Google ScholarDigital Library
- R. Berendsen, M. Braschler, M. Gäde, M. Kleineberg, M. Lupu, V. Petras, and S. Reitberger. Deliverable D4.3 -- Final Report on Alternative Evaluation Methodology. PROMISE Network of Excellence, EU 7FP, Contract N. 258191. http://www.promise-noe.eu/documents/10156/0092298d-892b-45c0-a534-b9a3d0c717b1, September 2012.Google Scholar
- R. Berendsen, E. Tsagkias, M. de Rijke, and E. Meij. Generating pseudo test collections for learning to rank scientific articles. In Catarci et al. {24}.Google Scholar
- M. Braschler, K. Choukri, N. Ferro, A. Hanbury, J. Karlgren, H. Müller, V. Petras, E. Pianta, M. de Rijke, and G. Santucci. A PROMISE for Experimental Evaluation. In Agosti et al. {8}, pages 140--144. Google ScholarDigital Library
- M. Braschler, D. K. Harman, and E. Pianta, editors. CLEF 2010 Labs and Workshops, Notebook Papers. MINT srl, Trento, Italy. ISBN 978-88-904810-0-0., 2010.Google Scholar
- M. Braschler, S. Reitberger, M. Imhof, A. Järvelin, P. Hansen, M. Lupu, M. Gäde, R. Berendsen, and A. Garcia Seco de Herrera. Deliverable D2.3 -- Best Practices Report. PROMISE Network of Excellence, EU 7FP, Contract N. 258191. http://www.promise-noe.eu/documents/10156/086010bb-0d3f-46ef-946f-f0bbeef305e8, August 2012.Google Scholar
- P. Brereton, B. A. Kitchenham, D. Budgen, M. Turner, and M. Khalil. Lessons from applying the systematic literature review process within the software engineering domain. Journal of Systems and Software, 80:571--583, 2007. Google ScholarDigital Library
- V. R. Carvalho, M. Lease, and E. Yilmaz. Crowdsourcing for search evaluation. SIGIR Forum, 44(2):17--22, 2011. Google ScholarDigital Library
- T. Catarci, P. Forner, D. Hiemstra, A. Peñas, and G. Santucci, editors. Information Access Evaluation. Multilinguality, Multimodality, and Visual Analytics. Proceedings of the Third International Conference of the CLEF Initiative (CLEF 2012). Lecture Notes in Computer Science (LNCS) 7488, Springer, Heidelberg, Germany, 2012. Google ScholarDigital Library
- C. W. Cleverdon. Report on the testing and analysis of an investigation into the comparative efficiency of indexing systems. Technical report, Aslib Cranfield Research Project, 1962.Google Scholar
- C. W. Cleverdon. The Cranfield Tests on Index Languages Devices. In K. Spärck Jones and P. Willett, editors, Readings in Information Retrieval, pages 47--60. Morgan Kaufmann Publisher, Inc., San Francisco, CA, USA, 1997. Google ScholarDigital Library
- M. Croce, E. Di Buccio, E. Di Reto, M. Dussin, N. Ferro, G. L. Granato, P. Hansen, M. Lupu, M. Perlorca, A. Pronesti, A. Sabetta, G. Santucci, G. Silvello, G. Tino, and T. Tsikrika. Deliverable D5.2 -- User interface and Visual analytics environment requirements. PROMISE Network of Excellence, EU 7FP, Contract N. 258191. http://www.promise-noe.eu/documents/10156/21f1512a-5b47-48ae-834a-89d6441d079e, August 2011.Google Scholar
- E. Deelman, D. Gannon, M. Shields, and I. Taylor. Workflows and e-science: An overview of workflow system features and capabilities. Future Generation Computer Systems, 25(5):528--540, 2009. Google ScholarDigital Library
- N. Ferro. DIRECT: the First Prototype of the PROMISE Evaluation Infrastructure for Information Retrieval Experimental Evaluation. ERCIM News, 86:54--55, July 2011.Google Scholar
- N. Ferro, A. Hanbury, H. Müller, and G. Santucci. Harnessing the Scientific Data Produced by the Experimental Evaluation of Search Engines and Information Access Systems. Procedia Computer Science, 4:740--749, 2011.Google ScholarCross Ref
- A. Foncubierta Rodríguez and H. Müller. Ground truth generation in medical imaging,a crowdsourcing-based iterative approach. In W. T. Chu, M. Larson, W. T. Ooi, and K.-T. Chen, editors, Proc. International ACM Workshop on Crowdsourcing for Multimedia (CrowdMM 2012), 2012. Google ScholarDigital Library
- P. Forner, J. Gonzalo, J. Kekäläinen, M. Lalmas, and M. de Rijke, editors. Multilingual and Multimodal Information Access Evaluation. Proceedings of the Second International Conference of the Cross-Language Evaluation Forum (CLEF 2011). Lecture Notes in Computer Science (LNCS) 6941, Springer, Heidelberg, Germany, 2011. Google ScholarDigital Library
- P. Forner, J. Karlgren, and C. Womser-Hacker, editors. CLEF 2012 Labs and Workshops, Notebook Papers. MINT srl, Trento, Italy. ISBN 978-88-904810-1-7., 2012.Google Scholar
- A. Hanbury, H. Müller, G. Langs, M. A. Weber, B. H. Menze, and T. S. Fernandez. Bringing the algorithms to the data: cloud-based benchmarking for medical image analysis. In Catarci et al. {24}.Google Scholar
- Allan Hanbury and Henning Müller. Automated component-level evaluation: Present and future. In Agosti et al. {8}, pages 124--135. Google ScholarDigital Library
- P. Hansen, G. L. Granato, and G. Santucci. Collecting and Assessing Collaborative Requirements. In C. Shah, P. Hansen, and R. Capra, editors, Proc. Workshop on Collaborative Information Seeking: Briding the Gap between Theory and Practice (CIS 2011), 2011.Google Scholar
- P. Hansen and A. Järvelin. Collaborative Information Retrieval in an Information-intensive Domain. Information Processing & Management, 41(5):1101--1119, September 2005. Google ScholarDigital Library
- D. K. Harman. Information Retrieval Evaluation. Morgan & Claypool Publishers, USA, 2011. Google ScholarDigital Library
- D. K. Harman and E. M. Voorhees, editors. TREC. Experiment and Evaluation in Information Retrieval. MIT Press, Cambridge (MA), USA, 2005. Google ScholarDigital Library
- B. Hefley and W. Murphy, editors. Service Science, Management, and Engineering: Education for the 21st Century. Springer, Heidelberg, Germany, 2008. Google ScholarDigital Library
- B. Huurnink, K. Hofmann, M. de Rijke, and M. Bron. Validating query simulators: An experiment using commercial searches and purchases. In Agosti et al. {8}, pages 40--51. Google ScholarDigital Library
- A. Järvelin, G. Eriksson, P. Hansen, T. Tsikrika, A. Garcia Seco de Herrera, M. Lupu, M. Gäde, V. Petras, S. Rietberger, M. Braschler, and R. Berendsen. Deliverable D2.2 -- Revised Specification of Evaluation Tasks. PROMISE Network of Excellence, EU 7FP, Contract N. 258191. http://www.promise-noe.eu/documents/10156/a0d664fe-16e4-4df6-bcf9-1dc3e5e8c18e, February 2012.Google Scholar
- G. Juve and E. Deelman. Scientific Workflows and Clouds. ACM Crossroads, 16(3):14--18, 2010. Google ScholarDigital Library
- Y. Kano, P. Dobson, M. Nakanishi, J. Tsujii, and S. Ananiadou. Text mining meets workflow: linking u-compare with taverna. Bioinformatics, 26(19):2486--2487, 2010. Google ScholarDigital Library
- D. Kelly. Methods for Evaluating Interactive Information Retrieval Systems with Users. Foundations and Trends in Information Retrieval (FnTIR), 3(1-2), 2009. Google ScholarDigital Library
- S. Kumpulainen and K. Järvelin. Information Interaction in Molecular Medicine: Integrated Use of Multiple Channels. In N. J. Belkin and D. a Kelly, editors, Proc. 3rd Symposium on Information Interaction in Context (IIiX 2010), pages 95--104. ACM Press, New York, USA, 2010. Google ScholarDigital Library
- M. Lease and E. Yilmaz. Crowdsourcing for information retrieval. SIGIR Forum, 45(2):66--75, 2012. Google ScholarDigital Library
- B. Mons, H. van Haagen, C. Chichester, P.-B. 't Hoen, J. T. den Dunnen, G. van Ommen, E. van Mulligen, B. Singh, R. Hooft, M. Roos, J. Hammond, B. Kiesel, B. Giardine, J. Velterop, P. Groth, and E. Schultes. The value of data. Nature Genetics, 43:281--283, 2011.Google ScholarCross Ref
- V. Petras, P. Forner, and P. Clough, editors. CLEF 2011 Labs and Workshops, Notebook Papers. MINT srl, Trento, Italy. ISBN 978-88-904810-1-7., 2011.Google Scholar
- S. Reitberger, M. Imhof, M. Braschler, R. Berendsen, A. Järvelin, P. Hansen, A. Garcia Seco de Herrera, T. Tsikrika, M. Lupu, V. Petras, M. Gäde, M. Kleineberg, and K. Choukri. Deliverable D4.2 -- Tutorial on Evaluation in the Wild. PROMISE Network of Excellence, EU 7FP, Contract N. 258191. http://www.promise-noe.eu/documents/10156/3f546a0b-be7c-48df-b228-924cc5e185cb, August 2012.Google Scholar
- S. E. Robertson. On the history of evaluation in IR. Journal of Information Science, 34(4):439--456, 2008. Google ScholarDigital Library
- B. R. Rowe, D. W. Wood, A. L. Link, and D. A. Simoni. Economic Impact Assessment of NIST's Text REtrieval Conference (TREC) Program. RTI Project Number 0211875, RTI International, USA. http://trec.nist.gov/pubs/2010.economic.impact.pdf, July 2010.Google Scholar
- M. Sanderson. Test Collection Based Evaluation of Information Retrieval Systems. Foundations and Trends in Information Retrieval (FnTIR), 4(4):247--375, 2010.Google Scholar
- J. Spohrer. Editorial Column-Welcome to Our Declaration of Interdependence. Service Science, 1(1):i--ii, 2009.Google Scholar
- C. V. Thornley, A. C. Johnson, A. F. Smeaton, and H. Lee. The Scholarly Impact of TRECVid (2003-2009). Journal of the American Society for Information Science and Technology (JASIST), 62(4):613--627, April 2011. Google ScholarDigital Library
- T. Tsikrika, A. Garcia Seco de Herrera, and H. Müller. Assessing the Scholarly Impact of Image-CLEF. In Forner et al. {32}, pages 95--106. Google ScholarDigital Library
- Z. Xie, M. O. Ward, and E. A. Rundensteiner. Visual exploration of stream pattern changes using a data-driven framework. In Proceedings of the 6th international conference on Advances in visual computing -- Volume Part II, ISVC'10, pages 522--532, Berlin, Heidelberg, 2010. Springer-Verlag. Google ScholarDigital Library
Index Terms
- PROMISE retreat report prospects and opportunities for information access evaluation
Recommendations
Future prospects in data processing
AFIPS '75: Proceedings of the May 19-22, 1975, national computer conference and expositionSo far, in the seventies, rapid revolutionary developments have occurred in the development and usage of computer systems. Already, we have passed through three generations of microprocessors, the distributed-function architecture (networks) has emerged,...
ACM SIGMM retreat report on future directions in multimedia research
The ACM Multimedia Special Interest Group was created ten years ago. Since that time, researchers have solved a number of important problems related to media processing, multimedia databases, and distributed multimedia applications. A strategic retreat ...
2019 Access SIGCHI report
SIGACCESS has been on the forefront of hosting accessible scientific conferences and providing guidelines for other ACM SIGs. However, promoting and assessing adoption of best practices is an area that benefits from shared work and attention across ...
Comments