skip to main content
10.1145/3184558.3191543acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article
Free Access

CrowdED: Guideline for Optimal Crowdsourcing Experimental Design

Authors Info & Claims
Published:23 April 2018Publication History

ABSTRACT

Crowdsourcing involves the creating of HITs (Human Intelligent Tasks), submitting them to a crowdsourcing platform and providing a monetary reward for each HIT. One of the advantages of using crowdsourcing is that the tasks can be highly parallelized, that is, the work is performed by a high number of workers in a decentralized setting. The design also offers a means to cross-check the accuracy of the answers by assigning each task to more than one person and thus relying on majority consensus as well as reward the workers according to their performance and productivity. Since each worker is paid per task, the costs can significantly increase, irrespective of the overall accuracy of the results. Thus, one important question when designing such crowdsourcing tasks that arise is how many workers to employ and how many tasks to assign to each worker when dealing with large amounts of tasks. That is, the main research questions we aim to answer is: 'Can we a-priori estimate optimal workers and tasks' assignment to obtain maximum accuracy on all tasks'. Thus, we introduce a two-staged statistical guideline, CrowdED, for optimal crowdsourcing experimental design in order to a-priori estimate optimal workers and tasks' assignment to obtain maximum accuracy on all tasks. We describe the algorithm and present preliminary results and discussions. We implement the algorithm in Python and make it openly available on Github, provide a Jupyter Notebook and a R Shiny app for users to re-use, interact and apply in their own crowdsourcing experiments.

References

  1. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, and Vingron M. 2011. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nature Genetics 29 (2011), 365--371. Issue 4.Google ScholarGoogle ScholarCross RefCross Ref
  2. Tanya Barrett, Dennis B. Troup, Stephen E. Wilhite, Pierre Ledoux, Carlos Evangelista, Irene F. Kim, Maxim Tomashevsky, Kimberly A. Marshall, Katherine H. Phillippy, Patti M. Sherman, Rolf N. Muertter, Michelle Holko, Oluwabukunmi Ayanbule, Andrey Yefanov, and Alexandra Soboleva. 2011. NCBI GEO: archive for functional genomics data sets--10 years on. Nucleic Acids Research 39 (2011), 991--995.Google ScholarGoogle Scholar
  3. Good BM, Nanis M, Wu C, and Su AI. 2015. Microtask crowdsourcing for disease mention annotation in PubMed abstracts. Pac Symp Biocomput. (2015), 282--293.Google ScholarGoogle Scholar
  4. C. L. Borgman. 2012. The conundrum of sharing research data. Journal of the American Society for Information Science and Technology 63 (2012), 1059--1078. Issue 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Peng Dai, Christopher H. Lin, Mausam, and Daniel S. Weld. 2013. POMDP-based Control of Workflows for Crowdsourcing. Artif. Intell. 202, 1 (Sept. 2013), 52--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Yihan Gao and Aditya Parameswaran. 2014. Finish Them!: Pricing Algorithms for Human Computation. Proc. VLDB Endow. 7, 14 (Oct. 2014), 1965--1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Karan Goel, Shreya Rajpal, and Mausam. 2017. Octopus: A Framework for Cost-Quality-Time Optimization in Crowdsourcing. CoRR abs/1702.03488 (2017). arXiv:1702.03488 http://arxiv.org/abs/1702.03488Google ScholarGoogle Scholar
  8. Jeff Howe. 2006. The Rise of Crowdsourcing. Wired Magazine 14, 6 (06 2006). http://www.wired.com/wired/archive/14.06/crowds.htmlGoogle ScholarGoogle Scholar
  9. Barzan Mozafari, Purna Sarkar, Michael Franklin, Michael Jordan, and Samuel Madden. 2014. Scaling up crowd-sourcing to very large datasets: a case for active learning. Proceedings of the VLDB Endowment 8 (2014), 125--136. Issue 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Victor S. Sheng, Foster Provost, and Panagiotis G. Ipeirotis. 2008. Get another label improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 614--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, Yefanov A, Lee H, Zhang N, Robertson CL, Serova N, Davis S, and Soboleva A. 2013. NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Research 41 (2013), 991--995.Google ScholarGoogle ScholarCross RefCross Ref
  12. U. Ul Hassan, S. ORiain, and E. Curry. 2013. Effects of expertise assessment on the quality of task routing in human computation. Proceedings of the 2nd International Workshop on Social Media for Crowdsourcing and Human Computation (2013).Google ScholarGoogle Scholar
  13. Umair ul Hassan, Amrapali Zaveri, Edgard Marx, Edward Curry, and Jens Lehmann. 2016. ACRyLIQ: Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment. In Knowledge Engineering and Knowledge Management, Eva Blomqvist, Paolo Ciancarini, Francesco Poggi, and Fabio Vitali (Eds.). Springer International Publishing, Cham, 681--696. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. CrowdED: Guideline for Optimal Crowdsourcing Experimental Design

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              WWW '18: Companion Proceedings of the The Web Conference 2018
              April 2018
              2023 pages
              ISBN:9781450356404

              Copyright © 2018 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              International World Wide Web Conferences Steering Committee

              Republic and Canton of Geneva, Switzerland

              Publication History

              • Published: 23 April 2018

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate1,899of8,196submissions,23%
            • Article Metrics

              • Downloads (Last 12 months)45
              • Downloads (Last 6 weeks)8

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format