skip to main content
research-article
Public Access

Quality Control in Crowdsourcing based on Fine-Grained Behavioral Features

Published:18 October 2021Publication History
Skip Abstract Section

Abstract

Crowdsourcing is popular for large-scale data collection and labeling, but a major challenge is on detecting low-quality submissions. Recent studies have demonstrated that behavioral features of workers are highly correlated with data quality and can be useful in quality control. However, these studies primarily leveraged coarsely extracted behavioral features, and did not further explore quality control at the fine-grained level, i.e., the annotation unit level. In this paper, we investigate the feasibility and benefits of using fine-grained behavioral features, which are the behavioral features finely extracted from a worker's individual interactions with each single unit in a subtask, for quality control in crowdsourcing. We design and implement a framework named Fine-grained Behavior-based Quality Control (FBQC) that specifically extracts fine-grained behavioral features to provide three quality control mechanisms: (1) quality prediction for objective tasks, (2) suspicious behavior detection for subjective tasks, and (3) unsupervised worker categorization. Using the FBQC framework, we conduct two real-world crowdsourcing experiments and demonstrate that using fine-grained behavioral features is feasible and beneficial in all three quality control mechanisms. Our work provides clues and implications for helping job requesters or crowdsourcing platforms to further achieve better quality control.

References

  1. Nat a M. Barbosa and Monchu Chen. 2019. Rehumanized Crowdsourcing: A Labeling Framework Addressing Bias and Ethics in Machine Learning. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI).Google ScholarGoogle Scholar
  2. Alessandro Checco, Jo Bates, and Gianluca Demartini. 2020. Adversarial Attacks on Crowdsourcing Quality Control. Journal of Artificial Intelligence Research (2020).Google ScholarGoogle Scholar
  3. Sheng-Yeh Chen, Chao-Chun Hsu, Chuan-Chun Kuo, Lun-Wei Ku, et almbox. 2018. Emotionlines: An emotion corpus of multi-party conversations. arXiv preprint arXiv:1802.08379 (2018).Google ScholarGoogle Scholar
  4. Alexander Philip Dawid and Allan M Skene. 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics) (1979).Google ScholarGoogle Scholar
  5. Ujwal Gadiraju, Gianluca Demartini, Ricardo Kawase, and Stefan Dietze. 2019. Crowd anatomy beyond the good and bad: Behavioral traces for crowd worker modeling and pre-selection. Computer Supported Cooperative Work (CSCW) (2019).Google ScholarGoogle Scholar
  6. Ujwal Gadiraju, Ricardo Kawase, Stefan Dietze, and Gianluca Demartini. 2015. Understanding malicious behavior in crowdsourcing platforms: The case of online surveys. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI).Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Tanya Goyal, Tyler McDonnell, Mucahid Kutlu, Tamer Elsayed, and Matthew Lease. 2018. Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to Ensure Quality Relevance Annotations. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing (HCOMP).Google ScholarGoogle ScholarCross RefCross Ref
  8. Shuguang Han, Peng Dai, Praveen Paritosh, and David Huynh. 2016. Crowdsourcing human annotation on web page structure: Infrastructure design and behavior-based quality control. ACM Transactions on Intelligent Systems and Technology (TIST) (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kotaro Hara, Abigail Adams, Kristy Milland, Saiph Savage, Chris Callison-Burch, and Jeffrey P Bigham. 2018. A data-driven analysis of workers' earnings on Amazon Mechanical Turk. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI).Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Eric Heim, Alexander Seitel, Jonas Andrulis, Fabian Isensee, Christian Stock, Tobias Ross, and Lena Maier-Hein. 2017. Clickstream analysis for crowd-based object segmentation with confidence. IEEE transactions on pattern analysis and machine intelligence (2017).Google ScholarGoogle Scholar
  11. Danula Hettiachchi, Niels van Berkel, Vassilis Kostakos, and Jorge Goncalves. 2020. CrowdCog: A Cognitive skill based system for heterogeneous task assignment and recommendation in crowdsourcing. Proceedings of the ACM on Human-Computer Interaction CSCW2 (2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Matthias Hirth, Sven Scheuring, Tobias Hoßfeld, Christian Schwartz, and Phuoc Tran-Gia. 2014. Predicting result quality in crowdsourcing using application layer monitoring. In Proceedings of IEEE International Conference on Communications and Electronics (ICCE).Google ScholarGoogle ScholarCross RefCross Ref
  13. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation (1997).Google ScholarGoogle Scholar
  14. Ayush Jain, Akash Das Sarma, Aditya Parameswaran, and Jennifer Widom. 2017. Understanding workers, developing effective tasks, and enhancing marketplace dynamics: a study of a large crowdsourcing marketplace. Proceedings of the VLDB Endowment (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2011. Worker types and personality traits in crowdsourcing relevance labels. In Proceedings of the ACM International Conference on Information and Knowledge Management.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gabriella Kazai and Imed Zitouni. 2016. Quality management in crowdsourcing using gold judges behavior. In Proceedings of the ACM International Conference on Web Search and Data Mining.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, Tom Duerig, and Vittorio Ferrari. 2020. The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale. IJCV (2020).Google ScholarGoogle ScholarCross RefCross Ref
  18. Walter S Lasecki, Jaime Teevan, and Ece Kamar. 2014. Information extraction and manipulation threats in crowd-powered systems. In Proceedings of the ACM conference on Computer Supported Cooperative Work & Social Computing (CSCW).Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hongwei Li and Bin Yu. 2014. Error rate bounds and iterative weighted majority voting for crowdsourcing. arXiv preprint arXiv:1411.4086 (2014).Google ScholarGoogle Scholar
  20. Ioanna Lykourentzou, Angeliki Antoniou, Yannick Naudet, and Steven P Dow. 2016. Personality matters: Balancing for personality types leads to better outcomes for crowd teams. In Proceedings of the ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW).Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Chris Madge, Juntao Yu, Jon Chamberlain, Udo Kruschwitz, Silviu Paun, and Massimo Poesio. 2019. Crowdsourcing and aggregating nested markable annotations. In Proceedings of the Annual Meeting of the Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  22. Christopher D Manning, Hinrich Schütze, and Prabhakar Raghavan. 2008. Introduction to information retrieval .Cambridge university press.Google ScholarGoogle Scholar
  23. Andrew Mao, Ece Kamar, Yiling Chen, Eric Horvitz, Megan Schwamb, Chris Lintott, and Arfon Smith. 2013. Volunteering versus work for pay: Incentives and tradeoffs in crowdsourcing. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing (HCOMP).Google ScholarGoogle ScholarCross RefCross Ref
  24. Asako Miura and Tetsuro Kobayashi. 2016. Survey satisficing inflates stereotypical responses in online experiment: The case of immigration study. Frontiers in psychology (2016).Google ScholarGoogle Scholar
  25. Ricky KP Mok, Rocky KC Chang, and Weichao Li. 2016. Detecting low-quality workers in QoE crowdtesting: A worker behavior-based approach. IEEE Transactions on Multimedia (2016).Google ScholarGoogle Scholar
  26. Daniel M Oppenheimer, Tom Meyvis, and Nicolas Davidenko. 2009. Instructional manipulation checks: Detecting satisficing to increase statistical power. Journal of experimental social psychology (2009).Google ScholarGoogle Scholar
  27. Gabriele Paolacci, Jesse Chandler, and Panagiotis G Ipeirotis. 2010. Running experiments on amazon mechanical turk. udgment and Decision Making (2010).Google ScholarGoogle Scholar
  28. Weiping Pei, Arthur Mayer, Kaylynn Tu, and Chuan Yue. 2020. Attention please: Your attention check questions in survey studies can be automatically answered. In Proceedings of The Web Conference 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, and Rada Mihalcea. 2019. MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. 527--536.Google ScholarGoogle ScholarCross RefCross Ref
  30. Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. 2019. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  31. Jeffrey M Rzeszotarski and Aniket Kittur. 2011. Instrumenting the crowd: using implicit behavioral measures to predict task performance. In Proceedings of the ACM Symposium on User Interface Software and Technology.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Susumu Saito, Chun-Wei Chiang, Saiph Savage, Teppei Nakano, Tetsunori Kobayashi, and Jeffrey P Bigham. 2019. TurkScanner: Predicting the hourly wage of microtasks. In Proceedings of The World Wide Web Conference (WWW).Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Rachel N Simons, Danna Gurari, and Kenneth R Fleischmann. 2020. " I Hope This Is Helpful" Understanding Crowdworkers' Challenges and Motivations for an Image Description Task. Proceedings of the ACM on Human-Computer Interaction CSCW2 (2020).Google ScholarGoogle Scholar
  34. Yu Suzuki, Yoshitaka Matsuda, and Satoshi Nakamura. 2019. Additional operations of simple HITs on microtask crowdsourcing for worker quality prediction. Journal of Information Processing (2019).Google ScholarGoogle Scholar
  35. Jeroen BP Vuurens and Arjen P De Vries. 2012. Obtaining high-quality relevance judgments using crowdsourcing. IEEE Internet Computing (2012).Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Haijun Zhai, Todd Lingren, Louise Deleger, Qi Li, Megan Kaiser, Laura Stoutenborough, and Imre Solti. 2013. Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing. Journal of medical Internet research (2013).Google ScholarGoogle Scholar

Index Terms

  1. Quality Control in Crowdsourcing based on Fine-Grained Behavioral Features

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the ACM on Human-Computer Interaction
        Proceedings of the ACM on Human-Computer Interaction  Volume 5, Issue CSCW2
        CSCW2
        October 2021
        5376 pages
        EISSN:2573-0142
        DOI:10.1145/3493286
        Issue’s Table of Contents

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 October 2021
        Published in pacmhci Volume 5, Issue CSCW2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader