Abstract
Despite the ever-increasing popularity of crowdsourcing (CS) in both industry and academia, procedures that ensure quality in its results are still elusive. We hypothesise that a CS design based on game theory can persuade workers to perform their tasks as quickly as possible with the highest quality. In order to do so, in this article we propose a CS framework inspired by the n-person Chicken game. Our aim is to address the problem of CS quality without compromising on CS benefits such as low monetary cost and high task completion speed. With that goal in mind, we study the effects of knowledge updates as well as incentives for good workers to continue playing. We define a general task with the characteristics of relevance assessment as a case study, because it has been widely explored in the past with CS due to its potential cost and complexity. In order to investigate our hypotheses, we conduct a simulation where we study the effect of the proposed framework on data accuracy, task completion time, and total monetary rewards. Based on a game-theoretical analysis, we study how different types of individuals would behave under a particular game scenario. In particular, we simulate a population comprised of different types of workers with varying ability to formulate optimal strategies and learn from their experiences. A simulation of the proposed framework produced results that support our hypothesis.
- Omar Alonso. 2013. Implementing crowdsourcing-based relevance experimentation: An industrial perspective. Inform. Retriev. 16, 2 (April 2013), 101--120. DOI:http://dx.doi.org/10.1007/s10791-012-9204-1. Google ScholarDigital Library
- Omar Alonso and Ricardo Baeza-Yates. 2011. Design and implementation of relevance assessments using crowdsourcing. In Advances in Information Retrieval. Springer, Berlin, 153--164. Google ScholarDigital Library
- Omar Alonso and Stefano Mizzaro. 2009. Can we get rid of TREC assessors? Using mechanical turk for relevance assessment. In Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation. 557--566.Google Scholar
- Omar Alonso, Daniel E. Rose, and Benjamin Stewart. 2008a. Crowdsourcing for relevance evaluation. ACM SIGIR Forum 42, 2 (2008), 9. Google ScholarDigital Library
- Omar Alonso, Daniel E. Rose, and Benjamin Stewart. 2008b. Crowdsourcing for relevance evaluation. SIGIR Forum 42, 2 (Nov. 2008), 9--15. DOI:http://dx.doi.org/10.1145/1480506.1480508 Google ScholarDigital Library
- Vamshi Ambati, Stephan Vogel, and Jaime Carbonell. 2010. Active learning and crowd-sourcing for machine translation. Lang. Resourc. Eval. 2, 1 (2010), 2169--2174.Google Scholar
- Jaime Arguello, Fernando Diaz, Jamie Callan, and Ben Carterette. 2011. A methodology for evaluating aggregated search results. In Advances in Information Retrieval, 33rd European Conference on IR Research 6611 (2011). 141--152. Google ScholarDigital Library
- Peter Bailey, Nick Craswell, Ian Soboroff, Paul Thomas, Arjen P. de Vries, and Emine Yilmaz. 2008. Relevance assessment: Are judges exchangeable and does it matter. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). ACM, New York, NY, 667--674. DOI:http://dx.doi.org/10.1145/1390334.1390447 Google ScholarDigital Library
- Roi Blanco, Harry Halpin, Daniel M. Herzig, Peter Mika, Jeffrey Pound, Henry S. Thompson, and Thanh Tran Duc. 2011. Repeatable and reliable search system evaluation using crowdsourcing. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, NY, 923--932. DOI:http://dx.doi.org/10.1145/2009916.2010039 Google ScholarDigital Library
- Ben Carterette and Ian Soborof. 2010. The effect of assessor error on IR system evaluation. In SIGIR’10. 539--546. Google ScholarDigital Library
- Sebastian Deterding, Miguel Sicart, Lennart Nacke, Kenton O’Hara, and Dan Dixon. 2011. Gamification. using game-design elements in non-gaming contexts. In PART 2-Proceedings of the 2011 Annual Conference Extended Abstracts on Human Factors in Computing Systems. ACM, New York, NY, 2425--2428. Google ScholarDigital Library
- Karl Wolfgang Deutsch. 1968. The Analysis of International Relations. Vol. 12. Prentice-Hall, Englewood Cliffs, NJ.Google Scholar
- Dominic DiPalantino and Milan Vojnovic. 2009. Crowdsourcing and all-pay auctions. In Proceedings of the 10th ACM Conference on Electronic Commerce (EC’09). ACM, New York, NY, 119--128. DOI:http://dx.doi.org/10.1145/1566374.1566392 Google ScholarDigital Library
- Julie S. Downs, Mandy B. Holbrook, Steve Sheng, and Lorrie Faith Cranor. 2010. Are your participants gaming the system?: Screening mechanical turk workers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’10). ACM, New York, NY, 2399--2402. DOI:http://dx.doi.org/ 10.1145/1753326.1753688 Google ScholarDigital Library
- Carsten Eickhoff and Arjen de Vries. 2011. How crowdsourcable is your task. In Proceedings of the Workshop on Crowdsourcing for Search and Data Mining (CSDM) at the 4th ACM International Conference on Web Search and Data Mining (WSDM). 11--14.Google Scholar
- Carsten Eickhoff and Arjen P de Vries. 2013. Increasing cheat robustness of crowdsourcing tasks. Inform. Retriev. (2013), 1--17. Google ScholarDigital Library
- Carsten Eickhoff, Christopher G. Harris, Arjen P. de Vries, and Padmini Srinivasan. 2012. Quality through flow and immersion: Gamifying crowdsourced relevance assessments. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY 871--880. Google ScholarDigital Library
- Arpita Ghosh and Patrick Hummel. 2012. Implementing optimal outcomes in social computing: A game-theoretic approach. In Proceedings of the 21st International Conference on World Wide Web. ACM, New York, NY, 539--548. Google ScholarDigital Library
- Catherine Grady and Matthew Lease. 2010. Crowdsourcing document relevance assessment with mechanical turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT’10). Association for Computational Linguistics, Stroudsburg, PA, 172--179. Google ScholarDigital Library
- John C. Harsanyi and Reinhard Selten. 1988. A General Theory of Equilibrium Selection in Games. MIT Press, Cambridge, MA.Google Scholar
- Jack Hirshleifer and John G. Riley. 1992. The Analytics of Uncertainty and Information. Cambridge University Press, Cambridge.Google Scholar
- John Joseph Horton and Lydia B. Chilton. 2010. The labor economics of paid crowdsourcing. In Proceedings of the 11th ACM Conference on Electronic Commerce. ACM, New York, NY, 209--218. Google ScholarDigital Library
- J. Howe. 2006. Crowdsourcing: A definition. Crowdsourcing Tracking the Rise of the Amateur 33 (2006), 159--174.Google Scholar
- Panagiotis G. Ipeirotis. 2010. Analyzing the amazon mechanical turk marketplace. XRDS 17, 2 (Dec. 2010), 16--21. DOI:http://dx.doi.org/10.1145/1869086.1869094 Google ScholarDigital Library
- Panagiotis G. Ipeirotis, Foster Provost, and Jing Wang. 2010. Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP’10). ACM, New York, NY, 64--67. DOI:http://dx.doi.org/10.1145/1837885.1837906 Google ScholarDigital Library
- Shaili Jain and David C. Parkes. 2009a. The role of game theory in human computation systems. In Proceedings of the ACM SIGKDD Workshop on Human Computation. ACM, New York, NY, 58--61. Google ScholarDigital Library
- Shaili Jain and David C. Parkes. 2009b. The role of game theory in human computation systems. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP’09). ACM, New York, NY, 58--61. DOI:http://dx.doi.org/10.1145/1600150.1600171 Google ScholarDigital Library
- Adam Kapelner and Dana Chandler. 2010. Preventing satisficing in online surveys: A ‘kapcha’ to ensure higher quality data. In Proceedings of the Worlds 1st Conference on the Future of Distributed Work (CrowdConf’10).Google Scholar
- Nicolas Kaufmann, Thimo Schulze, and Daniel Veit. 2011. More than fun and money. Worker motivation in crowdsourcing-a study on mechanical turk. In Proceedings of the 17th Americas Conference on Information Systems (AMCIS’11).Google Scholar
- Gabriella Kazai. 2011. In search of quality in crowdsourcing for search engine evaluation. In Advances in Information Retrieval, 33rd European Conference on IR Research, Vol. 6611. Springer, Berlin, 165--176. Google ScholarDigital Library
- Gabriella Kazai, Nick Craswell, Emine Yilmaz, and S. M. M Tahaghoghi. 2012. An analysis of systematic judging errors in information retrieval. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, NY, 105--114. DOI:http://dx.doi.org/ 10.1145/2396761.2396779 Google ScholarDigital Library
- Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2011. Worker types and personality traits in crowdsourcing relevance labels. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM, New York, NY, 1941--1944. Google ScholarDigital Library
- Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2012. The face of quality in crowdsourcing relevance labels: Demographics, personality and labeling accuracy. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, New York, NY, 2583--2586. Google ScholarDigital Library
- Robert Kern, Hans Thies, and Gerhard Satzger. 2010. Statistical quality control for human-based electronic services. In Service-Oriented Computing. Springer, Berlin, 243--257.Google Scholar
- Kenneth A. Kinney, Scott B. Huffman, and Juting Zhai. 2008. How evaluator domain expertise affects search result relevance judgments. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM’08). ACM, New York, NY, 591--598. DOI:http://dx.doi.org/10.1145/1458082.1458160 Google ScholarDigital Library
- Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing user studies with mechanical turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’08). ACM, New York, NY, 453--456. DOI:http://dx.doi.org/10.1145/1357054.1357127 Google ScholarDigital Library
- Edith Law, Paul N. Bennett, and Eric Horvitz. 2011. The effects of choice in routing relevance judgments. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, NY, 1127--1128. DOI:http://dx.doi.org/ 10.1145/2009916.2010082 Google ScholarDigital Library
- John Le, Andy Edmonds, Vaughn Hester, and Lukas Biewald. 2010. Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution. In Proceedings of the SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation. 21--26.Google Scholar
- Winter Mason and Siddharth Suri. 2011. Conducting behavioral research on Amazon’s Mechanical Turk. Behav. Res. Methods 44, 1 (June 2011), 1--23. DOI:http://dx.doi.org/10.3758/s13428-011-0124-6.Google ScholarCross Ref
- Winter Mason and Duncan J. Watts. 2009. Financial incentives and the “performance of crowds”. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP’09). ACM, New York, NY, 77--85. DOI:http://dx.doi.org/10.1145/1600150.1600175 Google ScholarDigital Library
- Paula McDermott and Colm O’Riordan. 2002. A System for multi-agent information retrieval. In Artificial Intelligence and Cognitive Science, Michael O’Neill, Richard F. E. Sutcliffe, Conor Ryan, Malachy Eaton, and Niall J. L. Griffith (Eds.). Lecture Notes in Computer Science, Vol. 2464. Springer, Berlin, 70--77. DOI:http://dx.doi.org/10.1007/3-540-45750-X_9 Google ScholarDigital Library
- Robert McGill, John W. Tukey, and Wayne A. Larsen. 1978. Variations of box plots. Am. Statist. 32, 1 (1978), 12--16.Google Scholar
- Carmen F. Menezes and David L. Hanson. 1970. On the theory of risk aversion. Int. Econ. Review (1970), 481--487.Google Scholar
- Sandeep Mishra, Martin L. Lalumière, and Robert J. Williams. 2010. Gambling as a form of risk-taking: Individual differences in personality, risk-accepting attitudes, and behavioral preferences for risk. Person. Indiv. Differ. 49, 6 (2010), 616--621.Google ScholarCross Ref
- Yashar Moshfeghi, Michael Matthews, Roi Blanco, and Joemon M. Jose. 2013. Influence of timeline and named-entity components on user engagement. In Advances in Information Retrieval, 35th European Conference on IR Research. 305--317. DOI:http://dx.doi.org/10.1007/978-3-642-36973-5_26 Google ScholarDigital Library
- Oded Nov, Mor Naaman, and Chen Ye. 2008. What drives content tagging: The case of photos on Flickr. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’08). ACM, New York, NY, 1097--1100. DOI:http://dx.doi.org/10.1145/1357054.1357225 Google ScholarDigital Library
- Stefanie Nowak and Stefan Rüger. 2010. How reliable are annotations via crowdsourcing: A study about inter-annotator agreement for multi-label image annotation. In Proceedings of the International Conference on Multimedia Information Retrieval. ACM, New York, NY, 557--566. Google ScholarDigital Library
- Martin J. Osborne and Ariel Rubinstein. 1994. A Course in Game Theory. MIT Press, Cambridge, MA.Google Scholar
- Laurel D. Riek, Maria F. O’Connor, and Peter Robinson. 2011. Guess what? a game for affective annotation of video using crowd sourcing. In Affective Computing and Intelligent Interaction. Springer, Berlin, 277--285. Google ScholarDigital Library
- Joel Ross, Lilly Irani, M. Six Silberman, Andrew Zaldivar, and Bill Tomlinson. 2010. Who are the crowdworkers?: Shifting demographics in mechanical turk. In CHI’10 Extended Abstracts on Human Factors in Computing Systems (CHI EA’10). ACM, New York, NY, 2863--2872. DOI:http://dx.doi.org/10.1145/ 1753846.1753873 Google ScholarDigital Library
- Vello Sermat and Robert P. Gregovich. 1966. The effect of experimental manipulation on cooperative behavior in a chicken game. Psychonom. Sci. 4, 12 (1966), 435--436. DOI:http://dx.doi.org/10.3758/BF03342377Google ScholarCross Ref
- Aaron D. Shaw, John J. Horton, and Daniel L. Chen. 2011. Designing incentives for inexpert human raters. In Proceedings of the ACM 2011 Conference on Computer Supported Cooperative Work (CSCW’11). ACM, New York, NY, 275--284. DOI:http://dx.doi.org/10.1145/1958824.1958865 Google ScholarDigital Library
- Victor S. Sheng, Foster Provost, and Panagiotis G. Ipeirotis. 2008. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). ACM, New York, NY, 614--622. DOI:http://dx.doi.org/10.1145/1401890.1401965 Google ScholarDigital Library
- Mark D. Smucker and Chandra Prakash Jethani. 2011. Measuring assessor accuracy: A comparison of nist assessors and user study participants. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, NY, 1231--1232. DOI:http://dx.doi.org/10.1145/2009916.2010134 Google ScholarDigital Library
- Rion Snow, Brendan O’Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and fast—but is it good?: Evaluating non-expert annotations for natural language tasks. In EMNLP. Association for Computational Linguistics, 254--263. Google ScholarDigital Library
- Miklos N. Szilagyi. 2007. Agent-based simulation of the n-person chicken game. In Advances in Dynamic Game Theory, Steffen Jorgensen, Marc Quincampoix, and Thomas L. Vincent (Eds.). Annals of the International Society of Dynamic Games, Vol. 9. Birkhäuser Boston, 696--703. DOI:http://dx.doi.org/10.1007/ 978-0-8176-4553-3_34Google Scholar
- Luis Von Ahn. 2006. Games with a purpose. Computer 39, 6 (2006), 92--94. Google ScholarDigital Library
- Luis Von Ahn and Laura Dabbish. 2008. Designing games with a purpose. Commun. ACM 51, 8 (2008), 58--67. Google ScholarDigital Library
- Jing Wang, Siamak Faridani, and P. Ipeirotis. 2011. Estimating the completion time of crowdsourced tasks using survival analysis models. In Proceedings of the Workshop on Crowdsourcing for Search and Data Mining (CSDM’11).Google Scholar
- Mark N. Wexler. 2010. Reconfiguring the sociology of the crowd: Exploring crowdsourcing. Int. J. Sociol. Soc. Policy (2010), 6--20.Google Scholar
- Yuxiang Zhao and Qinghua Zhu. 2012. Evaluation on crowdsourcing research: Current status and future direction. Inform. Syst. Front. (2012), 1--18. DOI:http://dx.doi.org/10.1007/s10796-012-9350-4Google Scholar
Index Terms
- A Game-Theory Approach for Effective Crowdsource-Based Relevance Assessment
Recommendations
A Game Theory Approach for Estimating Reliability of Crowdsourced Relevance Assessments
In this article, we propose an approach to improve quality in crowdsourcing (CS) tasks using Task Completion Time (TCT) as a source of information about the reliability of workers in a game-theoretical competitive scenario. Our approach is based on the ...
Identifying Careless Workers in Crowdsourcing Platforms: A Game Theory Approach
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information RetrievalIn this paper we introduce a game scenario for crowdsourcing (CS) using incentives as a bait for careless (gambler) workers, who respond to them in a characteristic way. We hypothesise that careless workers are risk-inclined and can be detected in the ...
Discovering theorems in game theory: Two-person games with unique pure Nash equilibrium payoffs
In this paper we provide a logical framework for two-person finite games in strategic form, and use it to design a computer program for discovering some classes of games that have unique pure Nash equilibrium payoffs. The classes of games that we ...
Comments