Abstract
We present an incremental Bayesian model that resolves key issues of crowd size and data quality for consensus labeling. We evaluate our method using data collected from a real-world citizen science program, BeeWatch, which invites members of the public in the United Kingdom to classify (label) photographs of bumblebees as one of 22 possible species. The biological recording domain poses two key and hitherto unaddressed challenges for consensus models of crowdsourcing: (1) the large number of potential species makes classification difficult, and (2) this is compounded by limited crowd availability, stemming from both the inherent difficulty of the task and the lack of relevant skills among the general public. We demonstrate that consensus labels can be reliably found in such circumstances with very small crowd sizes of around three to five users (i.e., through group sourcing). Our incremental Bayesian model, which minimizes crowd size by re-evaluating the quality of the consensus label following each species identification solicited from the crowd, is competitive with a Bayesian approach that uses a larger but fixed crowd size and outperforms majority voting. These results have important ecological applicability: biological recording programs such as BeeWatch can sustain themselves when resources such as taxonomic experts to confirm identifications by photo submitters are scarce (as is typically the case), and feedback can be provided to submitters in a timely fashion. More generally, our model provides benefits to any crowdsourced consensus labeling task where there is a cost (financial or otherwise) associated with soliciting a label.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Crowdsourcing Without a Crowd: Reliable Online Species Identification Using Bayesian Models to Minimize Crowd Size
- Steven Blake, Advaith Siddharthan, Hien Nguyen, Nirwan Sharma, Anne-Marie Robinson, Elaine O’Mahony, Ben Darvill, Chris Mellish, and René van der Wal. 2012. Natural language generation for nature conservation: Automating feedback to help volunteers identify bumblebee species. In Proceedings of the 24th International Conference on Computational Linguistics (COLING’12). 311--324.Google Scholar
- David N. Bonter and Caren B. Cooper. 2012. Data validation in citizen science: A case study from project feederwatch. Frontiers in Ecology and the Environment 10, 6, 305--307.Google ScholarCross Ref
- Doug Clow and Elpida Makriyannis. 2011. iSpot analysed: Participatory learning and reputation. In Proceedings of the 1st International Conference on Learning Analytics and Knowledge (LAK’11). ACM, New York, NY, 34--43. Google ScholarDigital Library
- Richard F. Comont, Helen E. Roy, Owen T. Lewis, Richard Harrington, Christopher R. Shortall, and Bethan V. Purse. 2012. Using biological traits to explain ladybird distribution patterns. Journal of Biogeography 39, 10, 1772--1781. DOI:http://dx.doi.org/10.1111/j.1365-2699.2012.02734.xGoogle ScholarCross Ref
- Finn Danielsen, Neil D. Burgess, and Andrew Balmford. 2005. Monitoring matters: Examining the potential of locally-based approaches. Biodiversity and Conservation 14, 11, 2507--2542.Google ScholarCross Ref
- Alexander Philip Dawid and Allan M. Skene. 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied Statistics 28, 1, 20--28.Google ScholarCross Ref
- Pinar Donmez and Jaime G. Carbonell. 2008. Proactive learning: Cost-sensitive active learning with multiple imperfect oracles. In Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, New York, NY, 619--628. Google ScholarDigital Library
- Jeremy J. D. Greenwood. 2007. Citizens, science and bird conservation. Journal of Ornithology 148, 1, 77--124.Google ScholarCross Ref
- Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter 11, 1, 10--18. Google ScholarDigital Library
- Wesley M. Hochachka, Daniel Fink, Rebecca A. Hutchinson, Daniel Sheldon, Weng-Keen Wong, and Steve Kelling. 2012. Data-intensive science applied to broad-scale citizen science. Trends in Ecology and Evolution 27, 2, 130--137.Google ScholarCross Ref
- Panagiotis G. Ipeirotis, Foster Provost, Victor S. Sheng, and Jing Wang. 2014. Repeated labeling using multiple noisy labelers. Data Mining and Knowledge Discovery 28, 2, 402--441. Google ScholarDigital Library
- Ece Kamar, Severin Hacker, and Eric Horvitz. 2012. Combining human and machine intelligence in large-scale crowdsourcing. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, Volume 1. 467--474. Google ScholarDigital Library
- David R. Karger, Sewoong Oh, and Devavrat Shah. 2014. Budget-optimal task allocation for reliable crowdsourcing systems. Operations Research 62, 1, 1--24. Google ScholarDigital Library
- Hongwei Li, Bo Zhao, and Ariel Fuxman. 2014. The wisdom of minority: Discovering and targeting the right group of workers for crowdsourcing. In Proceedings of the 23rd International Conference on World Wide Web. 165--176. Google ScholarDigital Library
- Chris J. Lintott, Kevin Schawinski, Anže Slosar, Kate Land, Steven Bamford, Daniel Thomas, M. Jordan Raddick, Robert C. Nichol, Alex Szalay, Dan Andreescu, Phil Murray, and Jan van den Berg. 2008. Galaxy zoo: Morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey. Monthly Notices of the Royal Astronomical Society 389, 3, 1179--1189.Google ScholarCross Ref
- Greg Little, Lydia B. Chilton, Max Goldman, and Robert C. Miller. 2009. Turkit: Tools for iterative tasks on mechanical turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation. ACM, New York, NY, 29--30. Google ScholarDigital Library
- Babak Loni, Jonathon Hare, Mihai Georgescu, Michael Riegler, Xiaofei Zhu, Mohamed Morchid, Richard Dufour, and Martha Larson. 2014. Getting by with a little help from the crowd: Practical approaches to social image labeling. In Proceedings of the 2014 International ACM Workshop on Crowdsourcing for Multimedia. ACM, New York, NY, 69--74. Google ScholarDigital Library
- Andrew Mao, Ece Kamar, and Eric Horvitz. 2013. Why stop now? Predicting worker engagement in online crowdsourcing. In Proceedings of the 1st AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
- Helen E. Roy, Tim Adriaens, Nick J. B. Isaac, Marc Kenis, Thierry Onkelinx, Gilles San Martin, Peter M. J. Brown, Louis Hautier, Remy Poland, David B. Roy, Richard Comont, Ren Eschen, Robert Frost, Renate Zindel, Johan Van Vlaenderen, Oldich Nedvd, Hans Peter Ravn, Jean-Claude Grgoire, Jean-Christophe de Biseau, and Dirk Maes. 2012. Invasive alien predator causes rapid declines of native European ladybirds. Diversity and Distributions 18, 7, 717--725. DOI:http://dx.doi.org/10.1111/j.1472-4642.2012.00883.xGoogle ScholarCross Ref
- Science Communication Unit. 2013. Science for Environment Policy Indepth Report: Environmental Citizen Science. Report produced for the European Commission DG Environment. European Commission, University of the West of England, Bristol. Available at http://ec.europa.eu/science-environment-policy.Google Scholar
- Victor S. Sheng, Foster Provost, and Panagiotis G. Ipeirotis. 2008. Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 614--622. Google ScholarDigital Library
- Aashish Sheshadri and Matthew Lease. 2013. SQUARE: A benchmark for research on computing crowd consensus. In Proceedings of the 1st AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
- Jonathan Silvertown, Martin Harvey, Richard Greenwood, Mike Dodd, Jon Rosewell, Tony Rebelo, Janice Ansine, and Kevin McConway. 2015. Crowdsourcing the identification of organisms: A case-study of iSpot. ZooKeys 480, 125.Google ScholarCross Ref
- Jeffrey S. Simonoff. 1995. Smoothing categorical data. Journal of Statistical Planning and Inference 47, 1, 41--69.Google ScholarCross Ref
- Rion Snow, Brendan O’Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and fast—but is it good? Evaluating non-expert annotations for natural language tasks. In Proceedings of the the Conference on Empirical Methods in Natural Language Processing. 254--263. Google ScholarDigital Library
- Wei Tang and Matthew Lease. 2011. Semi-supervised consensus labeling for crowdsourcing. In Proceedings of the Workshop on Crowdsourcing for Information Retrieval (CIR’11).Google Scholar
- René Van der Wal, Helen Anderson, Ann-Marie Robinson, Nirwan Sharma, Chris Mellish, Stewart Roberts, Darvill Benn, and Advaith Siddharthan. 2015. Mapping species distributions: A comparison of skilled naturalist and lay citizen science recording. AMBIO: A Journal of the Human Environment 44, 4, 584--600.Google ScholarCross Ref
- Matteo Venanzi, John Guiver, Gabriella Kazai, Pushmeet Kohli, and Milad Shokouhi. 2014. Community-based Bayesian aggregation models for crowdsourcing. In Proceedings of the 23rd International Conference on World Wide Web. 155--164. Google ScholarDigital Library
- Jacob Whitehill, Ting-Fan Wu, Jacob Bergsma, Javier R. Movellan, and Paul L. Ruvolo. 2009. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in Neural Information Processing Systems 22 (NIPS’09). 2035--2043.Google Scholar
- Yan Yan, Glenn M. Fung, Rómer Rosales, and Jennifer G. Dy. 2011. Active learning from crowds. In Proceedings of the 28th International Conference on Machine Learning (ICML-11). 1161--1168.Google Scholar
Recommendations
Crowdsourcing for agricultural applications
We reviewed crowdsourcing activities in agriculture and classified them in 4 categories.We identified 8 types of inputs that can be collected by crowdsourcing for agricultural applications.We discussed data quality and contributors participation.We ...
Modus Operandi of Crowd Workers: The Invisible Role of Microtask Work Environments
The ubiquity of the Internet and the widespread proliferation of electronic devices has resulted in flourishing microtask crowdsourcing marketplaces, such as Amazon MTurk. An aspect that has remained largely invisible in microtask crowdsourcing is that ...
Make Hay While the Crowd Shines: Towards Efficient Crowdsourcing on the Web
WWW '15 Companion: Proceedings of the 24th International Conference on World Wide WebWithin the scope of this PhD proposal, we set out to investigate two pivotal aspects that influence the effectiveness of crowdsourcing: (i) microtask design, and (ii) workers behavior. Leveraging the dynamics of tasks that are crowdsourced on the one ...
Comments