research-article

Free Access

CrowdED: Guideline for Optimal Crowdsourcing Experimental Design

Authors:
Amrapali Zaveri

Institute of Data Science, Maastricht University, Maastricht, Netherlands

Institute of Data Science, Maastricht University, Maastricht, Netherlands
View Profile

,
Pedro Hernandez Serrano

Institute of Data Science, Maastricht University, Maastricht, Netherlands

Institute of Data Science, Maastricht University, Maastricht, Netherlands
View Profile

,
Manisha Desai

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Michel Dumontier

Maastricht University, Maastricht, Netherlands

Maastricht University, Maastricht, Netherlands
View Profile

WWW '18: Companion Proceedings of the The Web Conference 2018April 2018Pages 1109–1116https://doi.org/10.1145/3184558.3191543

Published:23 April 2018Publication History

WWW '18: Companion Proceedings of the The Web Conference 2018

Pages 1109–1116

ABSTRACT

Crowdsourcing involves the creating of HITs (Human Intelligent Tasks), submitting them to a crowdsourcing platform and providing a monetary reward for each HIT. One of the advantages of using crowdsourcing is that the tasks can be highly parallelized, that is, the work is performed by a high number of workers in a decentralized setting. The design also offers a means to cross-check the accuracy of the answers by assigning each task to more than one person and thus relying on majority consensus as well as reward the workers according to their performance and productivity. Since each worker is paid per task, the costs can significantly increase, irrespective of the overall accuracy of the results. Thus, one important question when designing such crowdsourcing tasks that arise is how many workers to employ and how many tasks to assign to each worker when dealing with large amounts of tasks. That is, the main research questions we aim to answer is: 'Can we a-priori estimate optimal workers and tasks' assignment to obtain maximum accuracy on all tasks'. Thus, we introduce a two-staged statistical guideline, CrowdED, for optimal crowdsourcing experimental design in order to a-priori estimate optimal workers and tasks' assignment to obtain maximum accuracy on all tasks. We describe the algorithm and present preliminary results and discussions. We implement the algorithm in Python and make it openly available on Github, provide a Jupyter Notebook and a R Shiny app for users to re-use, interact and apply in their own crowdsourcing experiments.

References

Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, and Vingron M. 2011. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nature Genetics 29 (2011), 365--371. Issue 4.Google ScholarCross Ref
Tanya Barrett, Dennis B. Troup, Stephen E. Wilhite, Pierre Ledoux, Carlos Evangelista, Irene F. Kim, Maxim Tomashevsky, Kimberly A. Marshall, Katherine H. Phillippy, Patti M. Sherman, Rolf N. Muertter, Michelle Holko, Oluwabukunmi Ayanbule, Andrey Yefanov, and Alexandra Soboleva. 2011. NCBI GEO: archive for functional genomics data sets--10 years on. Nucleic Acids Research 39 (2011), 991--995.Google Scholar
Good BM, Nanis M, Wu C, and Su AI. 2015. Microtask crowdsourcing for disease mention annotation in PubMed abstracts. Pac Symp Biocomput. (2015), 282--293.Google Scholar
C. L. Borgman. 2012. The conundrum of sharing research data. Journal of the American Society for Information Science and Technology 63 (2012), 1059--1078. Issue 6. Google ScholarDigital Library
Peng Dai, Christopher H. Lin, Mausam, and Daniel S. Weld. 2013. POMDP-based Control of Workflows for Crowdsourcing. Artif. Intell. 202, 1 (Sept. 2013), 52--85. Google ScholarDigital Library
Yihan Gao and Aditya Parameswaran. 2014. Finish Them!: Pricing Algorithms for Human Computation. Proc. VLDB Endow. 7, 14 (Oct. 2014), 1965--1976. Google ScholarDigital Library
Karan Goel, Shreya Rajpal, and Mausam. 2017. Octopus: A Framework for Cost-Quality-Time Optimization in Crowdsourcing. CoRR abs/1702.03488 (2017). arXiv:1702.03488 http://arxiv.org/abs/1702.03488Google Scholar
Jeff Howe. 2006. The Rise of Crowdsourcing. Wired Magazine 14, 6 (06 2006). http://www.wired.com/wired/archive/14.06/crowds.htmlGoogle Scholar
Barzan Mozafari, Purna Sarkar, Michael Franklin, Michael Jordan, and Samuel Madden. 2014. Scaling up crowd-sourcing to very large datasets: a case for active learning. Proceedings of the VLDB Endowment 8 (2014), 125--136. Issue 2. Google ScholarDigital Library
Victor S. Sheng, Foster Provost, and Panagiotis G. Ipeirotis. 2008. Get another label improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 614--622. Google ScholarDigital Library
Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, Yefanov A, Lee H, Zhang N, Robertson CL, Serova N, Davis S, and Soboleva A. 2013. NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Research 41 (2013), 991--995.Google ScholarCross Ref
U. Ul Hassan, S. ORiain, and E. Curry. 2013. Effects of expertise assessment on the quality of task routing in human computation. Proceedings of the 2nd International Workshop on Social Media for Crowdsourcing and Human Computation (2013).Google Scholar
Umair ul Hassan, Amrapali Zaveri, Edgard Marx, Edward Curry, and Jens Lehmann. 2016. ACRyLIQ: Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment. In Knowledge Engineering and Knowledge Management, Eva Blomqvist, Paolo Ciancarini, Francesco Poggi, and Fabio Vitali (Eds.). Springer International Publishing, Cham, 681--696. Google ScholarDigital Library

Index Terms

CrowdED: Guideline for Optimal Crowdsourcing Experimental Design

Recommendations

Worker Viewpoints: Valuable Feedback for Microtask Designers in Crowdsourcing
WWW '17 Companion: Proceedings of the 26th International Conference on World Wide Web Companion

One of the problems a requester faces when crowdsourcing a microtask is that, due to the underspecified or ambiguous task description, workers may misinterpret the microtask at hand. We call a set of such interpretations worker viewpoints. In this paper,...
Read More
A Survey on Task Assignment in Crowdsourcing
Quality improvement methods are essential to gathering high-quality crowdsourced data, both for research and industry applications. A popular and broadly applicable method is task assignment that dynamically adjusts crowd workflow parameters. In this ...
Read More
Make Hay While the Crowd Shines: Towards Efficient Crowdsourcing on the Web
WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web

Within the scope of this PhD proposal, we set out to investigate two pivotal aspects that influence the effectiveness of crowdsourcing: (i) microtask design, and (ii) workers behavior. Leveraging the dynamics of tasks that are crowdsourced on the one ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '18: Companion Proceedings of the The Web Conference 2018
April 2018
2023 pages
ISBN:9781450356404
General Chairs:
Pierre-Antoine Champin
Université Claude Bernard Lyon 1, France
,
Fabien Gandon
Inria, Université Côte d'Azur, CNRS, I3S, France
,
Lionel Médini
Université Claude Bernard Lyon 1, CNRS, LIRIS, France
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Panagiotis G. Ipeirotis
New York University, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
International World Wide Web Conferences Steering Committee
Republic and Canton of Geneva, Switzerland
Publication History
- Published: 23 April 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
biomedical
crowdsourcing
data quality
data science
fair
metadata
reproducibility
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 388
  Total Downloads
- Downloads (Last 12 months)45
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

CrowdED: Guideline for Optimal Crowdsourcing Experimental Design

WWW '18: Companion Proceedings of the The Web Conference 2018

ABSTRACT

References

Cited By

Index Terms

Recommendations

Worker Viewpoints: Valuable Feedback for Microtask Designers in Crowdsourcing

A Survey on Task Assignment in Crowdsourcing

Make Hay While the Crowd Shines: Towards Efficient Crowdsourcing on the Web