research-article

Public Access

Quality Control in Crowdsourcing based on Fine-Grained Behavioral Features

Authors:
Weiping Pei

Colorado School of Mines, Golden, CO, USA

Colorado School of Mines, Golden, CO, USA
View Profile

,
Zhiju Yang

Colorado School of Mines, Golden, CO, USA

Colorado School of Mines, Golden, CO, USA
View Profile

,
Monchu Chen

Appen, Sunnyvale, CA, USA

Appen, Sunnyvale, CA, USA
View Profile

,
Chuan Yue

Colorado School of Mines, Golden, CO, USA

Colorado School of Mines, Golden, CO, USA
View Profile

Proceedings of the ACM on Human-Computer Interaction Volume 5 Issue CSCW2Article No.: 442pp 1–28https://doi.org/10.1145/3479586

Published:18 October 2021Publication History

Proceedings of the ACM on Human-Computer Interaction

Abstract

Crowdsourcing is popular for large-scale data collection and labeling, but a major challenge is on detecting low-quality submissions. Recent studies have demonstrated that behavioral features of workers are highly correlated with data quality and can be useful in quality control. However, these studies primarily leveraged coarsely extracted behavioral features, and did not further explore quality control at the fine-grained level, i.e., the annotation unit level. In this paper, we investigate the feasibility and benefits of using fine-grained behavioral features, which are the behavioral features finely extracted from a worker's individual interactions with each single unit in a subtask, for quality control in crowdsourcing. We design and implement a framework named Fine-grained Behavior-based Quality Control (FBQC) that specifically extracts fine-grained behavioral features to provide three quality control mechanisms: (1) quality prediction for objective tasks, (2) suspicious behavior detection for subjective tasks, and (3) unsupervised worker categorization. Using the FBQC framework, we conduct two real-world crowdsourcing experiments and demonstrate that using fine-grained behavioral features is feasible and beneficial in all three quality control mechanisms. Our work provides clues and implications for helping job requesters or crowdsourcing platforms to further achieve better quality control.

References

Nat a M. Barbosa and Monchu Chen. 2019. Rehumanized Crowdsourcing: A Labeling Framework Addressing Bias and Ethics in Machine Learning. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI).Google Scholar
Alessandro Checco, Jo Bates, and Gianluca Demartini. 2020. Adversarial Attacks on Crowdsourcing Quality Control. Journal of Artificial Intelligence Research (2020).Google Scholar
Sheng-Yeh Chen, Chao-Chun Hsu, Chuan-Chun Kuo, Lun-Wei Ku, et almbox. 2018. Emotionlines: An emotion corpus of multi-party conversations. arXiv preprint arXiv:1802.08379 (2018).Google Scholar
Alexander Philip Dawid and Allan M Skene. 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics) (1979).Google Scholar
Ujwal Gadiraju, Gianluca Demartini, Ricardo Kawase, and Stefan Dietze. 2019. Crowd anatomy beyond the good and bad: Behavioral traces for crowd worker modeling and pre-selection. Computer Supported Cooperative Work (CSCW) (2019).Google Scholar
Ujwal Gadiraju, Ricardo Kawase, Stefan Dietze, and Gianluca Demartini. 2015. Understanding malicious behavior in crowdsourcing platforms: The case of online surveys. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI).Google ScholarDigital Library
Tanya Goyal, Tyler McDonnell, Mucahid Kutlu, Tamer Elsayed, and Matthew Lease. 2018. Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to Ensure Quality Relevance Annotations. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing (HCOMP).Google ScholarCross Ref
Shuguang Han, Peng Dai, Praveen Paritosh, and David Huynh. 2016. Crowdsourcing human annotation on web page structure: Infrastructure design and behavior-based quality control. ACM Transactions on Intelligent Systems and Technology (TIST) (2016).Google ScholarDigital Library
Kotaro Hara, Abigail Adams, Kristy Milland, Saiph Savage, Chris Callison-Burch, and Jeffrey P Bigham. 2018. A data-driven analysis of workers' earnings on Amazon Mechanical Turk. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI).Google ScholarDigital Library
Eric Heim, Alexander Seitel, Jonas Andrulis, Fabian Isensee, Christian Stock, Tobias Ross, and Lena Maier-Hein. 2017. Clickstream analysis for crowd-based object segmentation with confidence. IEEE transactions on pattern analysis and machine intelligence (2017).Google Scholar
Danula Hettiachchi, Niels van Berkel, Vassilis Kostakos, and Jorge Goncalves. 2020. CrowdCog: A Cognitive skill based system for heterogeneous task assignment and recommendation in crowdsourcing. Proceedings of the ACM on Human-Computer Interaction CSCW2 (2020).Google ScholarDigital Library
Matthias Hirth, Sven Scheuring, Tobias Hoßfeld, Christian Schwartz, and Phuoc Tran-Gia. 2014. Predicting result quality in crowdsourcing using application layer monitoring. In Proceedings of IEEE International Conference on Communications and Electronics (ICCE).Google ScholarCross Ref
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation (1997).Google Scholar
Ayush Jain, Akash Das Sarma, Aditya Parameswaran, and Jennifer Widom. 2017. Understanding workers, developing effective tasks, and enhancing marketplace dynamics: a study of a large crowdsourcing marketplace. Proceedings of the VLDB Endowment (2017).Google ScholarDigital Library
Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2011. Worker types and personality traits in crowdsourcing relevance labels. In Proceedings of the ACM International Conference on Information and Knowledge Management.Google ScholarDigital Library
Gabriella Kazai and Imed Zitouni. 2016. Quality management in crowdsourcing using gold judges behavior. In Proceedings of the ACM International Conference on Web Search and Data Mining.Google ScholarDigital Library
Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, Tom Duerig, and Vittorio Ferrari. 2020. The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale. IJCV (2020).Google ScholarCross Ref
Walter S Lasecki, Jaime Teevan, and Ece Kamar. 2014. Information extraction and manipulation threats in crowd-powered systems. In Proceedings of the ACM conference on Computer Supported Cooperative Work & Social Computing (CSCW).Google ScholarDigital Library
Hongwei Li and Bin Yu. 2014. Error rate bounds and iterative weighted majority voting for crowdsourcing. arXiv preprint arXiv:1411.4086 (2014).Google Scholar
Ioanna Lykourentzou, Angeliki Antoniou, Yannick Naudet, and Steven P Dow. 2016. Personality matters: Balancing for personality types leads to better outcomes for crowd teams. In Proceedings of the ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW).Google ScholarDigital Library
Chris Madge, Juntao Yu, Jon Chamberlain, Udo Kruschwitz, Silviu Paun, and Massimo Poesio. 2019. Crowdsourcing and aggregating nested markable annotations. In Proceedings of the Annual Meeting of the Association for Computational Linguistics.Google ScholarCross Ref
Christopher D Manning, Hinrich Schütze, and Prabhakar Raghavan. 2008. Introduction to information retrieval .Cambridge university press.Google Scholar
Andrew Mao, Ece Kamar, Yiling Chen, Eric Horvitz, Megan Schwamb, Chris Lintott, and Arfon Smith. 2013. Volunteering versus work for pay: Incentives and tradeoffs in crowdsourcing. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing (HCOMP).Google ScholarCross Ref
Asako Miura and Tetsuro Kobayashi. 2016. Survey satisficing inflates stereotypical responses in online experiment: The case of immigration study. Frontiers in psychology (2016).Google Scholar
Ricky KP Mok, Rocky KC Chang, and Weichao Li. 2016. Detecting low-quality workers in QoE crowdtesting: A worker behavior-based approach. IEEE Transactions on Multimedia (2016).Google Scholar
Daniel M Oppenheimer, Tom Meyvis, and Nicolas Davidenko. 2009. Instructional manipulation checks: Detecting satisficing to increase statistical power. Journal of experimental social psychology (2009).Google Scholar
Gabriele Paolacci, Jesse Chandler, and Panagiotis G Ipeirotis. 2010. Running experiments on amazon mechanical turk. udgment and Decision Making (2010).Google Scholar
Weiping Pei, Arthur Mayer, Kaylynn Tu, and Chuan Yue. 2020. Attention please: Your attention check questions in survey studies can be automatically answered. In Proceedings of The Web Conference 2020.Google ScholarDigital Library
Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, and Rada Mihalcea. 2019. MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. 527--536.Google ScholarCross Ref
Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. 2019. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
Jeffrey M Rzeszotarski and Aniket Kittur. 2011. Instrumenting the crowd: using implicit behavioral measures to predict task performance. In Proceedings of the ACM Symposium on User Interface Software and Technology.Google ScholarDigital Library
Susumu Saito, Chun-Wei Chiang, Saiph Savage, Teppei Nakano, Tetsunori Kobayashi, and Jeffrey P Bigham. 2019. TurkScanner: Predicting the hourly wage of microtasks. In Proceedings of The World Wide Web Conference (WWW).Google ScholarDigital Library
Rachel N Simons, Danna Gurari, and Kenneth R Fleischmann. 2020. " I Hope This Is Helpful" Understanding Crowdworkers' Challenges and Motivations for an Image Description Task. Proceedings of the ACM on Human-Computer Interaction CSCW2 (2020).Google Scholar
Yu Suzuki, Yoshitaka Matsuda, and Satoshi Nakamura. 2019. Additional operations of simple HITs on microtask crowdsourcing for worker quality prediction. Journal of Information Processing (2019).Google Scholar
Jeroen BP Vuurens and Arjen P De Vries. 2012. Obtaining high-quality relevance judgments using crowdsourcing. IEEE Internet Computing (2012).Google ScholarDigital Library
Haijun Zhai, Todd Lingren, Louise Deleger, Qi Li, Megan Kaiser, Laura Stoutenborough, and Imre Solti. 2013. Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing. Journal of medical Internet research (2013).Google Scholar

Index Terms

Quality Control in Crowdsourcing based on Fine-Grained Behavioral Features
1. Human-centered computing
  1. Collaborative and social computing
    1. Collaborative and social computing design and evaluation methods
    2. Empirical studies in collaborative and social computing

Recommendations

Quality Control in Crowdsourcing Systems: Issues and Directions

As a new distributed computing model, crowdsourcing lets people leverage the crowd's intelligence and wisdom toward solving problems. This article proposes a framework for characterizing various dimensions of quality control in crowdsourcing systems, a ...
Read More
Evolutionary approach for crowdsourcing quality control

Crowdsourcing is widely used for solving simple tasks (e.g. tagging images) and recently, some researchers (Kittur et al., 2011 9 and Kulkarni et al., 2012 10) propose new crowdsourcing models to handle complex tasks (e.g. article writing). In both type ...
Read More
Crowdsourcing Quality Concerns: An Examination of Amazon’s Mechanical Turk
SIGITE '22: Proceedings of the 23rd Annual Conference on Information Technology Education

The use of crowdsourcing platforms, such as Amazon’s Mechanical Turk (MTurk), have been an effective and frequent tool for researchers to gather data from participants for a study. It provides a fast, efficient, and cost-effective method for acquiring ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the ACM on Human-Computer Interaction Volume 5, Issue CSCW2
CSCW2
October 2021
5376 pages
EISSN:2573-0142
DOI:10.1145/3493286
Editor:
Jeff Nichols
Apple Inc., United States
Issue’s Table of Contents
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 October 2021
Published in pacmhci Volume 5, Issue CSCW2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
behavior analysis
crowdsourcing
quality control
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 422
  Total Downloads
- Downloads (Last 12 months)147
- Downloads (Last 6 weeks)30
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Quality Control in Crowdsourcing based on Fine-Grained Behavioral Features

Proceedings of the ACM on Human-Computer Interaction

Abstract

References

Cited By

Index Terms

Recommendations

Quality Control in Crowdsourcing Systems: Issues and Directions

Evolutionary approach for crowdsourcing quality control

Crowdsourcing Quality Concerns: An Examination of Amazon’s Mechanical Turk

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Quality Control in Crowdsourcing based on Fine-Grained Behavioral Features

Proceedings of the ACM on Human-Computer Interaction

Abstract

References

Cited By

Index Terms

Recommendations

Quality Control in Crowdsourcing Systems: Issues and Directions

Evolutionary approach for crowdsourcing quality control

Crowdsourcing Quality Concerns: An Examination of Amazon’s Mechanical Turk

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media