ABSTRACT
Recently the concept of a clarity score was introduced in order to measure the ambiguity of a query in relation to the collection in which the query issuer is seeking information [Cronen-Townsend et al. Proc. ACM SIGIR2002, Tampere Finland, August 2002]. If the query is expressed in the "same language" as the whole collection then it has a low clarity score, otherwise it has a high score, where the similarity is the relative entropy of the query and collection models. Cronen-Townsend et al. show that clarity scores correlate directly with average precision, hence a query with a high clarity score is likely to produce relevant documents high in a list of resulting documents. Other authors, however, have shown that high precision does not necessarily correlate with increased user performance. In this paper we examine the correlation between user performance and clarity score. Using log files from user experiments conducted within the framework of the TREC Interactive Track, we measure the clarity score of all user queries, and their actual performance on the searching task. Our results show that there is no correlation between the clarity of a query and user performance. The results also demonstrate that users were able to slightly improve their queries, so that subsequent queries had slightly higher clarity scores than initial queries, but this was not dependent on the quality of the system they used, nor the user's searching experience.
- {Beaulieu et al., 2002} Beaulieu, M., Baeza-Yates, R., Myaeng, S., and Järvelin, K., editors (2002). Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland. ACM Press, NY.Google Scholar
- {Cronen-Townsend et al., 2002} Cronen-Townsend, S., Zhou, Y., and Croft, W. B. (2002). Predicting query performance. In {Beaulieu et al., 2002}, pages 299--306. Google ScholarDigital Library
- {Hersh and Over, 1999} Hersh, W. and Over, P. (1999). Trec-8 interactive track report. In {Voorhees and Harman, 1999}, pages 57--64.Google Scholar
- {Hersh and Over, 2000} Hersh, W. and Over P. (2000). Trec-9 interactive track report. In {Voorhees and Harman, 2000}, pages 41--49.Google Scholar
- {Hersh et al., 2000} Hersh, W., Turpin, A., Price, S., Chan, B., Kraemer, D., Sacherek, L., and Olson, D. (2000). Do batch and user evaluations give the same results. In Proc. 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 17--24, Athens Greece. ACM. Google ScholarDigital Library
- {Robertson and Walker, 1994} Robertson, S. and Walker, S. (1994). Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 232--241, Dublin, Ireland. ACM Press, NY. Google ScholarDigital Library
- {Salton and McGill, 1983} Salton, G. and McGill, M. (1983). Introduction to Modern Information Retrieval. McGraw-Hill, New York. Google ScholarDigital Library
- {Singhal et al., 1996} Singhal, A., Buckley, C., and Mitra, M. (1996). Pivoted document length normalization. In Frei, H.-P., Harman, D., Schäuble, P., and Wilkinson, R., editors, Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21--29, Zurich, Switzerland. ACM Press, NY. Google ScholarDigital Library
- {Turpin and Hersh, 2001} Turpin, A. and Hersh, W. (2001). Why batch and user evaluations do not give the same results. In Croft, W., Harper, D., Kraft, D., and Zobel, J., editors, Proc. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 225--231, New Orleans, LA. ACM. Google ScholarDigital Library
- {Turpin and Hersh, 2002} Turpin, A. and Hersh, W. (2002). User interface effects in past batch versus user experiments. In {Beaulieu et al., 2002}, pages 431--432. Google ScholarDigital Library
- {Voorhees and Harman, 1996} Voorhees, E. M. and Harman, D. K., editors (1996). Gaithersburg, MD. NIST Special Publication 500-238.Google Scholar
- {Voorhees and Harman, 1999} Voorhees, E. M. and Harman, D. K., editors (1999). Proceedings of the Eigth Text REtrieval Conference (TREC-8), Gaithersburg, MD. NIST Special Publication 500-246.Google ScholarCross Ref
- {Voorhees and Harman, 2000} Voorhees, E. M. and Harman, D. K., editors (2000). Gaithersburg, MD. NIST Special Publication 500-249.Google Scholar
- {Witten et al., 1999} Witten, I. H., Moffat, A., and Bell, T. C. (1999). Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishing, San Francisco, second edition. Google ScholarDigital Library
Index Terms
Do clarity scores for queries correlate with user performance?
Recommendations
Predicting query performance
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalWe develop a method for predicting query performance by computing the relative entropy between a query language model and the corresponding collection language model. The resulting clarity score measures the coherence of the language usage in documents ...
Evaluation over thousands of queries
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalInformation retrieval evaluation has typically been performed over several dozen queries, each judged to near-completeness. There has been a great deal of recent work on evaluation over much smaller judgment sets: how to select the best set of documents ...
A Yes/No Answer Generator Based on Sentiment-Word Scores in Biomedical Question Answering
Background and Objective: Yes/no question answering QA in open-domain is a longstanding challenge widely studied over the last decades. However, it still requires further efforts in the biomedical domain. Yes/no QA aims at answering yes/no questions, ...
Comments