research-article

A Game-Theory Approach for Effective Crowdsource-Based Relevance Assessment

Authors:
Yashar Moshfeghi

University of Glasgow, Glasgow, UK

University of Glasgow, Glasgow, UK
View Profile

,
Alvaro Francisco Huertas Rosero

University of Glasgow, Glasgow, UK

University of Glasgow, Glasgow, UK
View Profile

,
Joemon M. Jose

University of Glasgow, Glasgow, UK

University of Glasgow, Glasgow, UK
View Profile

ACM Transactions on Intelligent Systems and Technology Volume 7 Issue 4Article No.: 55pp 1–25https://doi.org/10.1145/2873063

Published:31 March 2016Publication History

ACM Transactions on Intelligent Systems and Technology

Abstract

Despite the ever-increasing popularity of crowdsourcing (CS) in both industry and academia, procedures that ensure quality in its results are still elusive. We hypothesise that a CS design based on game theory can persuade workers to perform their tasks as quickly as possible with the highest quality. In order to do so, in this article we propose a CS framework inspired by the n-person Chicken game. Our aim is to address the problem of CS quality without compromising on CS benefits such as low monetary cost and high task completion speed. With that goal in mind, we study the effects of knowledge updates as well as incentives for good workers to continue playing. We define a general task with the characteristics of relevance assessment as a case study, because it has been widely explored in the past with CS due to its potential cost and complexity. In order to investigate our hypotheses, we conduct a simulation where we study the effect of the proposed framework on data accuracy, task completion time, and total monetary rewards. Based on a game-theoretical analysis, we study how different types of individuals would behave under a particular game scenario. In particular, we simulate a population comprised of different types of workers with varying ability to formulate optimal strategies and learn from their experiences. A simulation of the proposed framework produced results that support our hypothesis.

References

Omar Alonso. 2013. Implementing crowdsourcing-based relevance experimentation: An industrial perspective. Inform. Retriev. 16, 2 (April 2013), 101--120. DOI:http://dx.doi.org/10.1007/s10791-012-9204-1. Google ScholarDigital Library
Omar Alonso and Ricardo Baeza-Yates. 2011. Design and implementation of relevance assessments using crowdsourcing. In Advances in Information Retrieval. Springer, Berlin, 153--164. Google ScholarDigital Library
Omar Alonso and Stefano Mizzaro. 2009. Can we get rid of TREC assessors? Using mechanical turk for relevance assessment. In Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation. 557--566.Google Scholar
Omar Alonso, Daniel E. Rose, and Benjamin Stewart. 2008a. Crowdsourcing for relevance evaluation. ACM SIGIR Forum 42, 2 (2008), 9. Google ScholarDigital Library
Omar Alonso, Daniel E. Rose, and Benjamin Stewart. 2008b. Crowdsourcing for relevance evaluation. SIGIR Forum 42, 2 (Nov. 2008), 9--15. DOI:http://dx.doi.org/10.1145/1480506.1480508 Google ScholarDigital Library
Vamshi Ambati, Stephan Vogel, and Jaime Carbonell. 2010. Active learning and crowd-sourcing for machine translation. Lang. Resourc. Eval. 2, 1 (2010), 2169--2174.Google Scholar
Jaime Arguello, Fernando Diaz, Jamie Callan, and Ben Carterette. 2011. A methodology for evaluating aggregated search results. In Advances in Information Retrieval, 33rd European Conference on IR Research 6611 (2011). 141--152. Google ScholarDigital Library
Peter Bailey, Nick Craswell, Ian Soboroff, Paul Thomas, Arjen P. de Vries, and Emine Yilmaz. 2008. Relevance assessment: Are judges exchangeable and does it matter. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). ACM, New York, NY, 667--674. DOI:http://dx.doi.org/10.1145/1390334.1390447 Google ScholarDigital Library
Roi Blanco, Harry Halpin, Daniel M. Herzig, Peter Mika, Jeffrey Pound, Henry S. Thompson, and Thanh Tran Duc. 2011. Repeatable and reliable search system evaluation using crowdsourcing. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, NY, 923--932. DOI:http://dx.doi.org/10.1145/2009916.2010039 Google ScholarDigital Library
Ben Carterette and Ian Soborof. 2010. The effect of assessor error on IR system evaluation. In SIGIR’10. 539--546. Google ScholarDigital Library
Sebastian Deterding, Miguel Sicart, Lennart Nacke, Kenton O’Hara, and Dan Dixon. 2011. Gamification. using game-design elements in non-gaming contexts. In PART 2-Proceedings of the 2011 Annual Conference Extended Abstracts on Human Factors in Computing Systems. ACM, New York, NY, 2425--2428. Google ScholarDigital Library
Karl Wolfgang Deutsch. 1968. The Analysis of International Relations. Vol. 12. Prentice-Hall, Englewood Cliffs, NJ.Google Scholar
Dominic DiPalantino and Milan Vojnovic. 2009. Crowdsourcing and all-pay auctions. In Proceedings of the 10th ACM Conference on Electronic Commerce (EC’09). ACM, New York, NY, 119--128. DOI:http://dx.doi.org/10.1145/1566374.1566392 Google ScholarDigital Library
Julie S. Downs, Mandy B. Holbrook, Steve Sheng, and Lorrie Faith Cranor. 2010. Are your participants gaming the system?: Screening mechanical turk workers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’10). ACM, New York, NY, 2399--2402. DOI:http://dx.doi.org/ 10.1145/1753326.1753688 Google ScholarDigital Library
Carsten Eickhoff and Arjen de Vries. 2011. How crowdsourcable is your task. In Proceedings of the Workshop on Crowdsourcing for Search and Data Mining (CSDM) at the 4th ACM International Conference on Web Search and Data Mining (WSDM). 11--14.Google Scholar
Carsten Eickhoff and Arjen P de Vries. 2013. Increasing cheat robustness of crowdsourcing tasks. Inform. Retriev. (2013), 1--17. Google ScholarDigital Library
Carsten Eickhoff, Christopher G. Harris, Arjen P. de Vries, and Padmini Srinivasan. 2012. Quality through flow and immersion: Gamifying crowdsourced relevance assessments. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY 871--880. Google ScholarDigital Library
Arpita Ghosh and Patrick Hummel. 2012. Implementing optimal outcomes in social computing: A game-theoretic approach. In Proceedings of the 21st International Conference on World Wide Web. ACM, New York, NY, 539--548. Google ScholarDigital Library
Catherine Grady and Matthew Lease. 2010. Crowdsourcing document relevance assessment with mechanical turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT’10). Association for Computational Linguistics, Stroudsburg, PA, 172--179. Google ScholarDigital Library
John C. Harsanyi and Reinhard Selten. 1988. A General Theory of Equilibrium Selection in Games. MIT Press, Cambridge, MA.Google Scholar
Jack Hirshleifer and John G. Riley. 1992. The Analytics of Uncertainty and Information. Cambridge University Press, Cambridge.Google Scholar
John Joseph Horton and Lydia B. Chilton. 2010. The labor economics of paid crowdsourcing. In Proceedings of the 11th ACM Conference on Electronic Commerce. ACM, New York, NY, 209--218. Google ScholarDigital Library
J. Howe. 2006. Crowdsourcing: A definition. Crowdsourcing Tracking the Rise of the Amateur 33 (2006), 159--174.Google Scholar
Panagiotis G. Ipeirotis. 2010. Analyzing the amazon mechanical turk marketplace. XRDS 17, 2 (Dec. 2010), 16--21. DOI:http://dx.doi.org/10.1145/1869086.1869094 Google ScholarDigital Library
Panagiotis G. Ipeirotis, Foster Provost, and Jing Wang. 2010. Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP’10). ACM, New York, NY, 64--67. DOI:http://dx.doi.org/10.1145/1837885.1837906 Google ScholarDigital Library
Shaili Jain and David C. Parkes. 2009a. The role of game theory in human computation systems. In Proceedings of the ACM SIGKDD Workshop on Human Computation. ACM, New York, NY, 58--61. Google ScholarDigital Library
Shaili Jain and David C. Parkes. 2009b. The role of game theory in human computation systems. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP’09). ACM, New York, NY, 58--61. DOI:http://dx.doi.org/10.1145/1600150.1600171 Google ScholarDigital Library
Adam Kapelner and Dana Chandler. 2010. Preventing satisficing in online surveys: A ‘kapcha’ to ensure higher quality data. In Proceedings of the Worlds 1st Conference on the Future of Distributed Work (CrowdConf’10).Google Scholar
Nicolas Kaufmann, Thimo Schulze, and Daniel Veit. 2011. More than fun and money. Worker motivation in crowdsourcing-a study on mechanical turk. In Proceedings of the 17th Americas Conference on Information Systems (AMCIS’11).Google Scholar
Gabriella Kazai. 2011. In search of quality in crowdsourcing for search engine evaluation. In Advances in Information Retrieval, 33rd European Conference on IR Research, Vol. 6611. Springer, Berlin, 165--176. Google ScholarDigital Library
Gabriella Kazai, Nick Craswell, Emine Yilmaz, and S. M. M Tahaghoghi. 2012. An analysis of systematic judging errors in information retrieval. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, NY, 105--114. DOI:http://dx.doi.org/ 10.1145/2396761.2396779 Google ScholarDigital Library
Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2011. Worker types and personality traits in crowdsourcing relevance labels. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM, New York, NY, 1941--1944. Google ScholarDigital Library
Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2012. The face of quality in crowdsourcing relevance labels: Demographics, personality and labeling accuracy. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, New York, NY, 2583--2586. Google ScholarDigital Library
Robert Kern, Hans Thies, and Gerhard Satzger. 2010. Statistical quality control for human-based electronic services. In Service-Oriented Computing. Springer, Berlin, 243--257.Google Scholar
Kenneth A. Kinney, Scott B. Huffman, and Juting Zhai. 2008. How evaluator domain expertise affects search result relevance judgments. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM’08). ACM, New York, NY, 591--598. DOI:http://dx.doi.org/10.1145/1458082.1458160 Google ScholarDigital Library
Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing user studies with mechanical turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’08). ACM, New York, NY, 453--456. DOI:http://dx.doi.org/10.1145/1357054.1357127 Google ScholarDigital Library
Edith Law, Paul N. Bennett, and Eric Horvitz. 2011. The effects of choice in routing relevance judgments. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, NY, 1127--1128. DOI:http://dx.doi.org/ 10.1145/2009916.2010082 Google ScholarDigital Library
John Le, Andy Edmonds, Vaughn Hester, and Lukas Biewald. 2010. Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution. In Proceedings of the SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation. 21--26.Google Scholar
Winter Mason and Siddharth Suri. 2011. Conducting behavioral research on Amazon’s Mechanical Turk. Behav. Res. Methods 44, 1 (June 2011), 1--23. DOI:http://dx.doi.org/10.3758/s13428-011-0124-6.Google ScholarCross Ref
Winter Mason and Duncan J. Watts. 2009. Financial incentives and the “performance of crowds”. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP’09). ACM, New York, NY, 77--85. DOI:http://dx.doi.org/10.1145/1600150.1600175 Google ScholarDigital Library
Paula McDermott and Colm O’Riordan. 2002. A System for multi-agent information retrieval. In Artificial Intelligence and Cognitive Science, Michael O’Neill, Richard F. E. Sutcliffe, Conor Ryan, Malachy Eaton, and Niall J. L. Griffith (Eds.). Lecture Notes in Computer Science, Vol. 2464. Springer, Berlin, 70--77. DOI:http://dx.doi.org/10.1007/3-540-45750-X_9 Google ScholarDigital Library
Robert McGill, John W. Tukey, and Wayne A. Larsen. 1978. Variations of box plots. Am. Statist. 32, 1 (1978), 12--16.Google Scholar
Carmen F. Menezes and David L. Hanson. 1970. On the theory of risk aversion. Int. Econ. Review (1970), 481--487.Google Scholar
Sandeep Mishra, Martin L. Lalumière, and Robert J. Williams. 2010. Gambling as a form of risk-taking: Individual differences in personality, risk-accepting attitudes, and behavioral preferences for risk. Person. Indiv. Differ. 49, 6 (2010), 616--621.Google ScholarCross Ref
Yashar Moshfeghi, Michael Matthews, Roi Blanco, and Joemon M. Jose. 2013. Influence of timeline and named-entity components on user engagement. In Advances in Information Retrieval, 35th European Conference on IR Research. 305--317. DOI:http://dx.doi.org/10.1007/978-3-642-36973-5_26 Google ScholarDigital Library
Oded Nov, Mor Naaman, and Chen Ye. 2008. What drives content tagging: The case of photos on Flickr. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’08). ACM, New York, NY, 1097--1100. DOI:http://dx.doi.org/10.1145/1357054.1357225 Google ScholarDigital Library
Stefanie Nowak and Stefan Rüger. 2010. How reliable are annotations via crowdsourcing: A study about inter-annotator agreement for multi-label image annotation. In Proceedings of the International Conference on Multimedia Information Retrieval. ACM, New York, NY, 557--566. Google ScholarDigital Library
Martin J. Osborne and Ariel Rubinstein. 1994. A Course in Game Theory. MIT Press, Cambridge, MA.Google Scholar
Laurel D. Riek, Maria F. O’Connor, and Peter Robinson. 2011. Guess what? a game for affective annotation of video using crowd sourcing. In Affective Computing and Intelligent Interaction. Springer, Berlin, 277--285. Google ScholarDigital Library
Joel Ross, Lilly Irani, M. Six Silberman, Andrew Zaldivar, and Bill Tomlinson. 2010. Who are the crowdworkers?: Shifting demographics in mechanical turk. In CHI’10 Extended Abstracts on Human Factors in Computing Systems (CHI EA’10). ACM, New York, NY, 2863--2872. DOI:http://dx.doi.org/10.1145/ 1753846.1753873 Google ScholarDigital Library
Vello Sermat and Robert P. Gregovich. 1966. The effect of experimental manipulation on cooperative behavior in a chicken game. Psychonom. Sci. 4, 12 (1966), 435--436. DOI:http://dx.doi.org/10.3758/BF03342377Google ScholarCross Ref
Aaron D. Shaw, John J. Horton, and Daniel L. Chen. 2011. Designing incentives for inexpert human raters. In Proceedings of the ACM 2011 Conference on Computer Supported Cooperative Work (CSCW’11). ACM, New York, NY, 275--284. DOI:http://dx.doi.org/10.1145/1958824.1958865 Google ScholarDigital Library
Victor S. Sheng, Foster Provost, and Panagiotis G. Ipeirotis. 2008. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). ACM, New York, NY, 614--622. DOI:http://dx.doi.org/10.1145/1401890.1401965 Google ScholarDigital Library
Mark D. Smucker and Chandra Prakash Jethani. 2011. Measuring assessor accuracy: A comparison of nist assessors and user study participants. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, NY, 1231--1232. DOI:http://dx.doi.org/10.1145/2009916.2010134 Google ScholarDigital Library
Rion Snow, Brendan O’Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and fast—but is it good?: Evaluating non-expert annotations for natural language tasks. In EMNLP. Association for Computational Linguistics, 254--263. Google ScholarDigital Library
Miklos N. Szilagyi. 2007. Agent-based simulation of the n-person chicken game. In Advances in Dynamic Game Theory, Steffen Jorgensen, Marc Quincampoix, and Thomas L. Vincent (Eds.). Annals of the International Society of Dynamic Games, Vol. 9. Birkhäuser Boston, 696--703. DOI:http://dx.doi.org/10.1007/ 978-0-8176-4553-3_34Google Scholar
Luis Von Ahn. 2006. Games with a purpose. Computer 39, 6 (2006), 92--94. Google ScholarDigital Library
Luis Von Ahn and Laura Dabbish. 2008. Designing games with a purpose. Commun. ACM 51, 8 (2008), 58--67. Google ScholarDigital Library
Jing Wang, Siamak Faridani, and P. Ipeirotis. 2011. Estimating the completion time of crowdsourced tasks using survival analysis models. In Proceedings of the Workshop on Crowdsourcing for Search and Data Mining (CSDM’11).Google Scholar
Mark N. Wexler. 2010. Reconfiguring the sociology of the crowd: Exploring crowdsourcing. Int. J. Sociol. Soc. Policy (2010), 6--20.Google Scholar
Yuxiang Zhao and Qinghua Zhu. 2012. Evaluation on crowdsourcing research: Current status and future direction. Inform. Syst. Front. (2012), 1--18. DOI:http://dx.doi.org/10.1007/s10796-012-9350-4Google Scholar

Index Terms

A Game-Theory Approach for Effective Crowdsource-Based Relevance Assessment
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction

Recommendations

A Game Theory Approach for Estimating Reliability of Crowdsourced Relevance Assessments
In this article, we propose an approach to improve quality in crowdsourcing (CS) tasks using Task Completion Time (TCT) as a source of information about the reliability of workers in a game-theoretical competitive scenario. Our approach is based on the ...
Read More
Identifying Careless Workers in Crowdsourcing Platforms: A Game Theory Approach
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

In this paper we introduce a game scenario for crowdsourcing (CS) using incentives as a bait for careless (gambler) workers, who respond to them in a characteristic way. We hypothesise that careless workers are risk-inclined and can be detected in the ...
Read More
Discovering theorems in game theory: Two-person games with unique pure Nash equilibrium payoffs

In this paper we provide a logical framework for two-person finite games in strategic form, and use it to design a computer program for discovering some classes of games that have unique pure Nash equilibrium payoffs. The classes of games that we ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Intelligent Systems and Technology Volume 7, Issue 4
Special Issue on Crowd in Intelligent Systems, Research Note/Short Paper and Regular Papers
July 2016
498 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/2906145
Editor:
Yu Zheng
Microsoft Research, China
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 March 2016
- Revised: 1 December 2015
- Accepted: 1 December 2015
- Received: 1 February 2015
Published in tist Volume 7, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Game theory
crowdsourcing
relevance assessment
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 488
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Game-Theory Approach for Effective Crowdsource-Based Relevance Assessment

ACM Transactions on Intelligent Systems and Technology

Abstract

References

Cited By

Index Terms

Recommendations

A Game Theory Approach for Estimating Reliability of Crowdsourced Relevance Assessments

Identifying Careless Workers in Crowdsourcing Platforms: A Game Theory Approach

Discovering theorems in game theory: Two-person games with unique pure Nash equilibrium payoffs