skip to main content
research-article

A Game-Theory Approach for Effective Crowdsource-Based Relevance Assessment

Published:31 March 2016Publication History
Skip Abstract Section

Abstract

Despite the ever-increasing popularity of crowdsourcing (CS) in both industry and academia, procedures that ensure quality in its results are still elusive. We hypothesise that a CS design based on game theory can persuade workers to perform their tasks as quickly as possible with the highest quality. In order to do so, in this article we propose a CS framework inspired by the n-person Chicken game. Our aim is to address the problem of CS quality without compromising on CS benefits such as low monetary cost and high task completion speed. With that goal in mind, we study the effects of knowledge updates as well as incentives for good workers to continue playing. We define a general task with the characteristics of relevance assessment as a case study, because it has been widely explored in the past with CS due to its potential cost and complexity. In order to investigate our hypotheses, we conduct a simulation where we study the effect of the proposed framework on data accuracy, task completion time, and total monetary rewards. Based on a game-theoretical analysis, we study how different types of individuals would behave under a particular game scenario. In particular, we simulate a population comprised of different types of workers with varying ability to formulate optimal strategies and learn from their experiences. A simulation of the proposed framework produced results that support our hypothesis.

References

  1. Omar Alonso. 2013. Implementing crowdsourcing-based relevance experimentation: An industrial perspective. Inform. Retriev. 16, 2 (April 2013), 101--120. DOI:http://dx.doi.org/10.1007/s10791-012-9204-1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Omar Alonso and Ricardo Baeza-Yates. 2011. Design and implementation of relevance assessments using crowdsourcing. In Advances in Information Retrieval. Springer, Berlin, 153--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Omar Alonso and Stefano Mizzaro. 2009. Can we get rid of TREC assessors? Using mechanical turk for relevance assessment. In Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation. 557--566.Google ScholarGoogle Scholar
  4. Omar Alonso, Daniel E. Rose, and Benjamin Stewart. 2008a. Crowdsourcing for relevance evaluation. ACM SIGIR Forum 42, 2 (2008), 9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Omar Alonso, Daniel E. Rose, and Benjamin Stewart. 2008b. Crowdsourcing for relevance evaluation. SIGIR Forum 42, 2 (Nov. 2008), 9--15. DOI:http://dx.doi.org/10.1145/1480506.1480508 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Vamshi Ambati, Stephan Vogel, and Jaime Carbonell. 2010. Active learning and crowd-sourcing for machine translation. Lang. Resourc. Eval. 2, 1 (2010), 2169--2174.Google ScholarGoogle Scholar
  7. Jaime Arguello, Fernando Diaz, Jamie Callan, and Ben Carterette. 2011. A methodology for evaluating aggregated search results. In Advances in Information Retrieval, 33rd European Conference on IR Research 6611 (2011). 141--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Peter Bailey, Nick Craswell, Ian Soboroff, Paul Thomas, Arjen P. de Vries, and Emine Yilmaz. 2008. Relevance assessment: Are judges exchangeable and does it matter. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). ACM, New York, NY, 667--674. DOI:http://dx.doi.org/10.1145/1390334.1390447 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Roi Blanco, Harry Halpin, Daniel M. Herzig, Peter Mika, Jeffrey Pound, Henry S. Thompson, and Thanh Tran Duc. 2011. Repeatable and reliable search system evaluation using crowdsourcing. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, NY, 923--932. DOI:http://dx.doi.org/10.1145/2009916.2010039 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ben Carterette and Ian Soborof. 2010. The effect of assessor error on IR system evaluation. In SIGIR’10. 539--546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sebastian Deterding, Miguel Sicart, Lennart Nacke, Kenton O’Hara, and Dan Dixon. 2011. Gamification. using game-design elements in non-gaming contexts. In PART 2-Proceedings of the 2011 Annual Conference Extended Abstracts on Human Factors in Computing Systems. ACM, New York, NY, 2425--2428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Karl Wolfgang Deutsch. 1968. The Analysis of International Relations. Vol. 12. Prentice-Hall, Englewood Cliffs, NJ.Google ScholarGoogle Scholar
  13. Dominic DiPalantino and Milan Vojnovic. 2009. Crowdsourcing and all-pay auctions. In Proceedings of the 10th ACM Conference on Electronic Commerce (EC’09). ACM, New York, NY, 119--128. DOI:http://dx.doi.org/10.1145/1566374.1566392 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Julie S. Downs, Mandy B. Holbrook, Steve Sheng, and Lorrie Faith Cranor. 2010. Are your participants gaming the system?: Screening mechanical turk workers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’10). ACM, New York, NY, 2399--2402. DOI:http://dx.doi.org/ 10.1145/1753326.1753688 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Carsten Eickhoff and Arjen de Vries. 2011. How crowdsourcable is your task. In Proceedings of the Workshop on Crowdsourcing for Search and Data Mining (CSDM) at the 4th ACM International Conference on Web Search and Data Mining (WSDM). 11--14.Google ScholarGoogle Scholar
  16. Carsten Eickhoff and Arjen P de Vries. 2013. Increasing cheat robustness of crowdsourcing tasks. Inform. Retriev. (2013), 1--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Carsten Eickhoff, Christopher G. Harris, Arjen P. de Vries, and Padmini Srinivasan. 2012. Quality through flow and immersion: Gamifying crowdsourced relevance assessments. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY 871--880. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Arpita Ghosh and Patrick Hummel. 2012. Implementing optimal outcomes in social computing: A game-theoretic approach. In Proceedings of the 21st International Conference on World Wide Web. ACM, New York, NY, 539--548. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Catherine Grady and Matthew Lease. 2010. Crowdsourcing document relevance assessment with mechanical turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT’10). Association for Computational Linguistics, Stroudsburg, PA, 172--179. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. John C. Harsanyi and Reinhard Selten. 1988. A General Theory of Equilibrium Selection in Games. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  21. Jack Hirshleifer and John G. Riley. 1992. The Analytics of Uncertainty and Information. Cambridge University Press, Cambridge.Google ScholarGoogle Scholar
  22. John Joseph Horton and Lydia B. Chilton. 2010. The labor economics of paid crowdsourcing. In Proceedings of the 11th ACM Conference on Electronic Commerce. ACM, New York, NY, 209--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Howe. 2006. Crowdsourcing: A definition. Crowdsourcing Tracking the Rise of the Amateur 33 (2006), 159--174.Google ScholarGoogle Scholar
  24. Panagiotis G. Ipeirotis. 2010. Analyzing the amazon mechanical turk marketplace. XRDS 17, 2 (Dec. 2010), 16--21. DOI:http://dx.doi.org/10.1145/1869086.1869094 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Panagiotis G. Ipeirotis, Foster Provost, and Jing Wang. 2010. Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP’10). ACM, New York, NY, 64--67. DOI:http://dx.doi.org/10.1145/1837885.1837906 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Shaili Jain and David C. Parkes. 2009a. The role of game theory in human computation systems. In Proceedings of the ACM SIGKDD Workshop on Human Computation. ACM, New York, NY, 58--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Shaili Jain and David C. Parkes. 2009b. The role of game theory in human computation systems. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP’09). ACM, New York, NY, 58--61. DOI:http://dx.doi.org/10.1145/1600150.1600171 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Adam Kapelner and Dana Chandler. 2010. Preventing satisficing in online surveys: A ‘kapcha’ to ensure higher quality data. In Proceedings of the Worlds 1st Conference on the Future of Distributed Work (CrowdConf’10).Google ScholarGoogle Scholar
  29. Nicolas Kaufmann, Thimo Schulze, and Daniel Veit. 2011. More than fun and money. Worker motivation in crowdsourcing-a study on mechanical turk. In Proceedings of the 17th Americas Conference on Information Systems (AMCIS’11).Google ScholarGoogle Scholar
  30. Gabriella Kazai. 2011. In search of quality in crowdsourcing for search engine evaluation. In Advances in Information Retrieval, 33rd European Conference on IR Research, Vol. 6611. Springer, Berlin, 165--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Gabriella Kazai, Nick Craswell, Emine Yilmaz, and S. M. M Tahaghoghi. 2012. An analysis of systematic judging errors in information retrieval. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, NY, 105--114. DOI:http://dx.doi.org/ 10.1145/2396761.2396779 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2011. Worker types and personality traits in crowdsourcing relevance labels. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM, New York, NY, 1941--1944. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2012. The face of quality in crowdsourcing relevance labels: Demographics, personality and labeling accuracy. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, New York, NY, 2583--2586. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Robert Kern, Hans Thies, and Gerhard Satzger. 2010. Statistical quality control for human-based electronic services. In Service-Oriented Computing. Springer, Berlin, 243--257.Google ScholarGoogle Scholar
  35. Kenneth A. Kinney, Scott B. Huffman, and Juting Zhai. 2008. How evaluator domain expertise affects search result relevance judgments. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM’08). ACM, New York, NY, 591--598. DOI:http://dx.doi.org/10.1145/1458082.1458160 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing user studies with mechanical turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’08). ACM, New York, NY, 453--456. DOI:http://dx.doi.org/10.1145/1357054.1357127 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Edith Law, Paul N. Bennett, and Eric Horvitz. 2011. The effects of choice in routing relevance judgments. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, NY, 1127--1128. DOI:http://dx.doi.org/ 10.1145/2009916.2010082 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. John Le, Andy Edmonds, Vaughn Hester, and Lukas Biewald. 2010. Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution. In Proceedings of the SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation. 21--26.Google ScholarGoogle Scholar
  39. Winter Mason and Siddharth Suri. 2011. Conducting behavioral research on Amazon’s Mechanical Turk. Behav. Res. Methods 44, 1 (June 2011), 1--23. DOI:http://dx.doi.org/10.3758/s13428-011-0124-6.Google ScholarGoogle ScholarCross RefCross Ref
  40. Winter Mason and Duncan J. Watts. 2009. Financial incentives and the “performance of crowds”. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP’09). ACM, New York, NY, 77--85. DOI:http://dx.doi.org/10.1145/1600150.1600175 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Paula McDermott and Colm O’Riordan. 2002. A System for multi-agent information retrieval. In Artificial Intelligence and Cognitive Science, Michael O’Neill, Richard F. E. Sutcliffe, Conor Ryan, Malachy Eaton, and Niall J. L. Griffith (Eds.). Lecture Notes in Computer Science, Vol. 2464. Springer, Berlin, 70--77. DOI:http://dx.doi.org/10.1007/3-540-45750-X_9 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Robert McGill, John W. Tukey, and Wayne A. Larsen. 1978. Variations of box plots. Am. Statist. 32, 1 (1978), 12--16.Google ScholarGoogle Scholar
  43. Carmen F. Menezes and David L. Hanson. 1970. On the theory of risk aversion. Int. Econ. Review (1970), 481--487.Google ScholarGoogle Scholar
  44. Sandeep Mishra, Martin L. Lalumière, and Robert J. Williams. 2010. Gambling as a form of risk-taking: Individual differences in personality, risk-accepting attitudes, and behavioral preferences for risk. Person. Indiv. Differ. 49, 6 (2010), 616--621.Google ScholarGoogle ScholarCross RefCross Ref
  45. Yashar Moshfeghi, Michael Matthews, Roi Blanco, and Joemon M. Jose. 2013. Influence of timeline and named-entity components on user engagement. In Advances in Information Retrieval, 35th European Conference on IR Research. 305--317. DOI:http://dx.doi.org/10.1007/978-3-642-36973-5_26 Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Oded Nov, Mor Naaman, and Chen Ye. 2008. What drives content tagging: The case of photos on Flickr. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’08). ACM, New York, NY, 1097--1100. DOI:http://dx.doi.org/10.1145/1357054.1357225 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Stefanie Nowak and Stefan Rüger. 2010. How reliable are annotations via crowdsourcing: A study about inter-annotator agreement for multi-label image annotation. In Proceedings of the International Conference on Multimedia Information Retrieval. ACM, New York, NY, 557--566. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Martin J. Osborne and Ariel Rubinstein. 1994. A Course in Game Theory. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  49. Laurel D. Riek, Maria F. O’Connor, and Peter Robinson. 2011. Guess what? a game for affective annotation of video using crowd sourcing. In Affective Computing and Intelligent Interaction. Springer, Berlin, 277--285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Joel Ross, Lilly Irani, M. Six Silberman, Andrew Zaldivar, and Bill Tomlinson. 2010. Who are the crowdworkers?: Shifting demographics in mechanical turk. In CHI’10 Extended Abstracts on Human Factors in Computing Systems (CHI EA’10). ACM, New York, NY, 2863--2872. DOI:http://dx.doi.org/10.1145/ 1753846.1753873 Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Vello Sermat and Robert P. Gregovich. 1966. The effect of experimental manipulation on cooperative behavior in a chicken game. Psychonom. Sci. 4, 12 (1966), 435--436. DOI:http://dx.doi.org/10.3758/BF03342377Google ScholarGoogle ScholarCross RefCross Ref
  52. Aaron D. Shaw, John J. Horton, and Daniel L. Chen. 2011. Designing incentives for inexpert human raters. In Proceedings of the ACM 2011 Conference on Computer Supported Cooperative Work (CSCW’11). ACM, New York, NY, 275--284. DOI:http://dx.doi.org/10.1145/1958824.1958865 Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Victor S. Sheng, Foster Provost, and Panagiotis G. Ipeirotis. 2008. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). ACM, New York, NY, 614--622. DOI:http://dx.doi.org/10.1145/1401890.1401965 Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Mark D. Smucker and Chandra Prakash Jethani. 2011. Measuring assessor accuracy: A comparison of nist assessors and user study participants. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, NY, 1231--1232. DOI:http://dx.doi.org/10.1145/2009916.2010134 Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Rion Snow, Brendan O’Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and fast—but is it good?: Evaluating non-expert annotations for natural language tasks. In EMNLP. Association for Computational Linguistics, 254--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Miklos N. Szilagyi. 2007. Agent-based simulation of the n-person chicken game. In Advances in Dynamic Game Theory, Steffen Jorgensen, Marc Quincampoix, and Thomas L. Vincent (Eds.). Annals of the International Society of Dynamic Games, Vol. 9. Birkhäuser Boston, 696--703. DOI:http://dx.doi.org/10.1007/ 978-0-8176-4553-3_34Google ScholarGoogle Scholar
  57. Luis Von Ahn. 2006. Games with a purpose. Computer 39, 6 (2006), 92--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Luis Von Ahn and Laura Dabbish. 2008. Designing games with a purpose. Commun. ACM 51, 8 (2008), 58--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Jing Wang, Siamak Faridani, and P. Ipeirotis. 2011. Estimating the completion time of crowdsourced tasks using survival analysis models. In Proceedings of the Workshop on Crowdsourcing for Search and Data Mining (CSDM’11).Google ScholarGoogle Scholar
  60. Mark N. Wexler. 2010. Reconfiguring the sociology of the crowd: Exploring crowdsourcing. Int. J. Sociol. Soc. Policy (2010), 6--20.Google ScholarGoogle Scholar
  61. Yuxiang Zhao and Qinghua Zhu. 2012. Evaluation on crowdsourcing research: Current status and future direction. Inform. Syst. Front. (2012), 1--18. DOI:http://dx.doi.org/10.1007/s10796-012-9350-4Google ScholarGoogle Scholar

Index Terms

  1. A Game-Theory Approach for Effective Crowdsource-Based Relevance Assessment

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Intelligent Systems and Technology
        ACM Transactions on Intelligent Systems and Technology  Volume 7, Issue 4
        Special Issue on Crowd in Intelligent Systems, Research Note/Short Paper and Regular Papers
        July 2016
        498 pages
        ISSN:2157-6904
        EISSN:2157-6912
        DOI:10.1145/2906145
        • Editor:
        • Yu Zheng
        Issue’s Table of Contents

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 31 March 2016
        • Revised: 1 December 2015
        • Accepted: 1 December 2015
        • Received: 1 February 2015
        Published in tist Volume 7, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader