Abstract
When crowdsourcing the creation of machine learning datasets, statistical distributions that capture diverse answers can represent ambiguous data better than a single best answer. Unfortunately, collecting distributions is expensive because a large number of responses need to be collected to form a stable distribution. Despite this, the efficient collection of answer distributions-that is, ways to use less human effort to collect estimates of the eventual distribution that would be formed by a large group of responses-is an under-studied topic. In this paper, we demonstrate that this type of estimation is possible and characterize different elicitation approaches to guide the development of future systems. We investigate eight elicitation approaches along two dimensions: annotation granularity and estimation perspective. Annotation granularity is varied by annotating i) a single "best" label, ii) all relevant labels, iii) a ranking of all relevant labels, or iv) real-valued weights for all relevant labels. Estimation perspective is varied by prompting workers to either respond with their own answer or an estimate of the answer(s) that they expect other workers would provide. Our study collected ordinal annotations on the emotional valence of facial images from 1,960 crowd workers and found that, surprisingly, the most fine-grained elicitation methods were not the most accurate, despite workers spending more time to provide answers. Instead, the most efficient approach was to ask workers to choose all relevant classes that others would have selected. This resulted in a 21.4% reduction in the human time required to reach the same performance as the baseline (i.e., selecting a single answer with their own perspective). By analyzing cases in which finer-grained annotations degraded performance, we contribute to a better understanding of the trade-offs between answer elicitation approaches. Our work makes it more tractable to use answer distributions in large-scale tasks such as ML training, and aims to spark future work on techniques that can efficiently estimate answer distributions.
- Lora Aroyo and Chris Welty. 2015. Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation. AI Magazine, Vol. 36, 1 (2015), 15--24. https://doi.org/10.1609/aimag.v36i1.2564Google ScholarDigital Library
- Alexandry Augustin, Matteo Venanzi, Alex Rogers, and Nicholas R. Jennings. 2017. Bayesian Aggregation of Categorical Distributions with Applications in Crowdsourcing. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI'17). AAAI Press, Menlo Park, CA, USA, 1411--1417. https://doi.org/10.24963/ijcai.2017/195Google Scholar
- Arkar Min Aung and Jacob Whitehill. 2018. Harnessing Label Uncertainty to Improve Modeling: An Application to Student Engagement Recognition. In IEEE International Conference on Automatic Face Gesture Recognition (FG'18). IEEE, Piscataway, New Jersey, US, 166--170. https://doi.org/10.1109/FG.2018.00033Google ScholarCross Ref
- Kirsten Boehner, Rogério DePaula, Paul Dourish, and Phoebe Sengers. 2007. How emotion is made and measured. International Journal of Human-Computer Studies, Vol. 65, 4 (2007), 275 -- 291. https://doi.org/10.1016/j.ijhcs.2006.11.016Google ScholarDigital Library
- Jonathan Bragg, Daniel S Weld, et almbox. 2013. Crowdsourcing multi-label classification for taxonomy creation. In First AAAI conference on human computation and crowdsourcing. AAAI, Palo Alto, CA, USA.Google Scholar
- Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017. Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'17). ACM, New York, NY, USA, 2334--2346. https://doi.org/10.1145/3025453.3026044Google ScholarDigital Library
- Quanze Chen, Jonathan Bragg, Lydia B. Chilton, and Dan S. Weld. 2019. Cicero: Multi-Turn, Contextual Argumentation for Accurate Crowdsourcing. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'19). ACM, New York, NY, USA, Paper No. 531. https://doi.org/10.1145/3290605.3300761Google Scholar
- Lydia B Chilton, Greg Little, Darren Edge, Daniel S Weld, and James A Landay. 2013. Cascade: Crowdsourcing taxonomy creation. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'13). ACM, New York, NY, USA, 1999--2008. https://doi.org/10.1145/2470654.2466265Google ScholarDigital Library
- Minsuk Choi, Cheonbok Park, Soyoung Yang, Yonggyu Kim, Jaegul Choo, and Sungsoo Ray Hong. 2019. AILA: Attentive Interactive Labeling Assistant for Document Classification through Attention-Based Deep Neural Networks. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'19). ACM, New York, NY, USA, Paper No. 230. https://doi.org/10.1145/3290605.3300460Google ScholarDigital Library
- Andrew M. Colman, Claire E. Norris, and Carolyn C. Preston. 1997. Comparing Rating Scales of Different Lengths: Equivalence of Scores from 5-Point and 7-Point Scales. Psychological Reports, Vol. 80, 2 (1997), 355--362. https://doi.org/10.2466/pr0.1997.80.2.355Google ScholarCross Ref
- John Dawes. 2008. Do Data Characteristics Change According to the Number of Scale Points Used? An Experiment Using 5-Point, 7-Point and 10-Point Scales. International Journal of Market Research, Vol. 50, 1 (2008), 61--104. https://doi.org/10.1177/147078530805000106Google ScholarCross Ref
- Adeline Delavande and Susann Rohwedder. 2008. Eliciting subjective probabilities in Internet surveys. Public Opinion Quarterly, Vol. 72, 5 (2008), 866--891. https://doi.org/10.1093/poq/nfn062Google ScholarCross Ref
- Don A Dillman, Jolene D Smyth, and Leah Melani Christian. 2014. Internet, phone, mail, and mixed-mode surveys: the tailored design method .John Wiley & Sons.Google ScholarDigital Library
- Shayan Doroudi, Ece Kamar, Emma Brunskill, and Eric Horvitz. 2016. Toward a Learning Science for Complex Crowdsourcing Tasks. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA, 2623--2634. https://doi.org/10.1145/2858036.2858268Google ScholarDigital Library
- Anca Dumitrache. 2015. Crowdsourcing disagreement for collecting semantic annotation. In European Semantic Web Conference. Springer, 701--710. https://doi.org/10.1007/978--3--319--18818--8_43Google ScholarCross Ref
- Anca Dumitrache, Lora Aroyo, and Chris Welty. 2018a. Capturing Ambiguity in Crowdsourcing Frame Disambiguation. In Proceedings of the Sixth AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2018, Zü rich, Switzerland, July 5--8, 2018. AAAI, Palo Alto, CA, USA, 12--20.Google Scholar
- Anca Dumitrache, Lora Aroyo, and Chris Welty. 2018b. Crowdsourcing Ground Truth for Medical Relation Extraction. ACM Trans. Interact. Intell. Syst., Vol. 8, 2 (2018), 11:1--11:20. https://doi.org/10.1145/3152889Google ScholarDigital Library
- Anca Dumitrache, Oana Inel, Lora Aroyo, Benjamin Timmermans, and Chris Welty. 2018c. CrowdTruth 2.0: Quality Metrics for Crowdsourcing with Disagreement. arXiv preprint arXiv:1808.06080 (2018).Google Scholar
- Natalie C Ebner, Michaela Riediger, and Ulman Lindenberger. 2010. FACES--A database of facial expressions in young, middle-aged, and older women and men: Development and validation. Behavior research methods, Vol. 42, 1 (2010), 351--362. https://doi.org/10.3758/BRM.42.1.351Google Scholar
- Bradley Efron and Robert J Tibshirani. 1994. An introduction to the bootstrap .CRC press.Google Scholar
- Hillary Anger Elfenbein and Nalini Ambady. 2002. On the universality and cultural specificity of emotion recognition: a meta-analysis. Psychological bulletin, Vol. 128, 2 (2002), 203.Google Scholar
- Nicholas Epley and Eugene M Caruso. 2008. Perspective Taking: Misstepping Into Others' Shoes. Handbook of imagination and mental simulation (2008), 295.Google Scholar
- Nicholas Epley, Boaz Keysar, Leaf Van Boven, and Thomas Gilovich. 2004. Perspective taking as egocentric anchoring and adjustment. Journal of personality and social psychology, Vol. 87, 3 (2004), 327. https://doi.org/10.1037/0022--3514.87.3.327Google ScholarCross Ref
- Bin-Bin Gao, Chao Xing, Chen-Wei Xie, Jianxin Wu, and Xin Geng. 2017. Deep Label Distribution Learning With Label Ambiguity. IEEE Transactions on Image Processing, Vol. 26, 6 (2017), 2825--2838. https://doi.org/10.1109/TIP.2017.2689998Google ScholarDigital Library
- Xin Geng. 2016. Label Distribution Learning. IEEE Transactions on Knowledge and Data Engineering, Vol. 28, 7 (2016), 1734--1748. https://doi.org/10.1109/TKDE.2016.2545658Google ScholarCross Ref
- Xin Geng, Chao C. Yin, and Zhi-Hua Z. Zhou. 2013. Facial Age Estimation by Learning from Label Distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, 10 (2013), 2401--2412. https://doi.org/10.1109/TPAMI.2013.51Google ScholarDigital Library
- Victor Girotto, Erin Walker, and Winslow Burleson. 2019. CrowdMuse: Supporting Crowd Idea Generation through User Modeling and Adaptation. In Proceedings of the Conference on Creativity and Cognition (C&C'19). ACM, New York, NY, USA, 95--106. https://doi.org/10.1145/3325480.3325497Google ScholarDigital Library
- Daniel G Goldstein and David Rothschild. 2014. Lay understanding of probability distributions. Judgment & Decision Making, Vol. 9, 1 (2014).Google Scholar
- D. G. Gordon and T. D. Breaux. 2014. The role of legal expertise in interpretation of legal requirements and definitions. In 2014 IEEE 22nd International Requirements Engineering Conference (RE). IEEE, Piscataway, New Jersey, US, 273--282. https://doi.org/10.1109/RE.2014.6912269Google ScholarCross Ref
- Paul Grau, Babak Naderi, and Juho Kim. 2018. Personalized Motivation-supportive Messages for Increasing Participation in Crowd-civic Systems. Proceedings of the ACM on Human-Computer Interaction, Vol. 2, CSCW (2018), 60. https://doi.org/10.1145/3274329Google ScholarDigital Library
- Thomas L. Griffiths, Charles Kemp, and Joshua B. Tenenbaum. 2008. Bayesian Models of Cognition .Cambridge University Press, 59--100. https://doi.org/10.1017/CBO9780511816772.006Google Scholar
- Danna Gurari and Kristen Grauman. 2017. CrowdVerge: Predicting If People Will Agree on the Answer to a Visual Question. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'17). ACM, New York, NY, USA, 3511--3522. https://doi.org/10.1145/3025453.3025781Google ScholarDigital Library
- Danna Gurari, Yinan Zhao, Suyog Dutt Jain, Margrit Betke, and Kristen Grauman. 2019. Predicting How to Distribute Work Between Algorithms and Humans to Segment an Image Batch. International Journal of Computer Vision (2019), 1--19.Google Scholar
- Sungsoo Ray Hong, Minhyang Mia Suh, Nathalie Henry Riche, Jooyoung Lee, Juho Kim, and Mark Zachry. 2018. Collaborative Dynamic Queries: Supporting Distributed Small Group Decision-making. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'18). ACM, New York, NY, USA, Paper No. 66. https://doi.org/10.1145/3173574.3173640Google ScholarDigital Library
- Sungsoo Ray Hong, Minhyang Mia Suh, Tae Soo Kim, Irian Smoke, Sangwha Sien, Janet Ng, Mark Zachry, and Juho Kim. 2019. Design for Collaborative Information-Seeking: Understanding User Challenges and Deploying Collaborative Dynamic Queries. Proceedings of the ACM on Human-Computer Interaction, Vol. 3 (2019), Article 106. https://doi.org/10.1145/3359208Google ScholarDigital Library
- Shih-Wen Huang and Wai-Tat Fu. 2013. Enhancing Reliability Using Peer Consistency Evaluation in Human Computation. In Proceedings of the ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW'13). ACM, New York, NY, USA, 639--648. https://doi.org/10.1145/2441776.2441847Google ScholarDigital Library
- Mohit Iyyer, Peter Enns, Jordan Boyd-Graber, and Philip Resnik. 2014. Political Ideology Detection Using Recursive Neural Networks. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, Maryland, 1113--1122. https://doi.org/10.3115/v1/P14--1105Google ScholarCross Ref
- Irving L Janis. 1971. Groupthink. Psychology today, Vol. 5, 6 (1971), 43--46.Google Scholar
- Alan Jern, Kai-Min K Chang, and Charles Kemp. 2014. Belief polarization is not always irrational. Psychological review, Vol. 121, 2 (2014), 206. https://doi.org/10.1037/a0035941Google Scholar
- Alan Jern, Kai min Chang, and Charles Kemp. 2009. Bayesian belief polarization. In Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta (Eds.). Curran Associates, Inc., 853--861. http://papers.nips.cc/paper/3725-bayesian-belief-polarization.pdfGoogle Scholar
- Xiuyi Jia, Weiwei Li, Junyu Liu, and Yu Zhang. 2018. Label Distribution Learning by Exploiting Label Correlations. In Thirty-Second AAAI Conference on Artificial Intelligence. AAAI, Palo Alto, CA, USA.Google Scholar
- Youxuan Jiang, Catherine Finegan-Dollak, Jonathan K. Kummerfeld, and Walter Lasecki. 2018. Effective Crowdsourcing for a New Type of Summarization Task. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 628--633. https://doi.org/httts://doi.org/10.18653/v1/N18--2099Google ScholarCross Ref
- Youxuan Jiang, Jonathan K. Kummerfeld, and Walter S. Lasecki. 2017. Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Vancouver, Canada, 103--109. https://doi.org/10.18653/v1/P17--2017Google Scholar
- Rob Johns. 2010. Likert items and scales. Survey Question Bank: Methods Fact Sheet, Vol. 1 (2010), 1--11.Google Scholar
- Eva Jonas, Stefan Schulz-Hardt, Dieter Frey, and Norman Thelen. 2001. Confirmation bias in sequential information search after preliminary decisions: an expansion of dissonance theoretical research on selective exposure to information. Journal of personality and social psychology, Vol. 80, 4 (2001), 557.Google ScholarCross Ref
- David Jurgens. 2013. Embracing Ambiguity: A Comparison of Annotation Methodologies for Crowdsourcing Word Sense Labels. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Atlanta, Georgia, USA, 556--562.Google Scholar
- Sanjay Kairam and Jeffrey Heer. 2016. Parting Crowds: Characterizing Divergent Interpretations in Crowdsourced Annotation Tasks. In Proceedings of the ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW'16). ACM, New York, NY, USA, 1637--1648. https://doi.org/10.1145/2818048.2820016Google ScholarDigital Library
- Matthew Kay, Tara Kola, Jessica R. Hullman, and Sean A. Munson. 2016. When (Ish) is My Bus?: User-centered Visualizations of Uncertainty in Everyday, Mobile Predictive Systems. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA, 5092--5103. https://doi.org/10.1145/2858036.2858558Google Scholar
- Thomas Kelly. 2008. Disagreement, dogmatism, and belief polarization. The Journal of Philosophy, Vol. 105, 10 (2008), 611--633.Google ScholarCross Ref
- Hyunwoo Kim, Eun-Young Ko, Donghoon Han, Sung-Chul Lee, Simon T Perrault, Jihee Kim, and Juho Kim. 2019. Crowdsourcing Perspectives on Public Policy from Stakeholders. In Extended Abstracts of the ACM Conference on Human Factors in Computing Systems (CHI'19). ACM, New York, NY, USA, LBW1220. https://doi.org/10.1145/3290607.3312769Google ScholarDigital Library
- Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing User Studies with Mechanical Turk. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'08). ACM, New York, NY, USA, 453--456. https://doi.org/10.1145/1357054.1357127Google ScholarDigital Library
- Solomon Kullback and Richard A Leibler. 1951. On information and sufficiency. The annals of mathematical statistics, Vol. 22, 1 (1951), 79--86. http://www.jstor.org/stable/2236703Google Scholar
- Walter Lasecki and Jeffrey Bigham. 2012. Self-correcting crowds. In Extended Abstracts of the ACM conference on Human Factors in Computing Systems (CHI'12). ACM, New York, NY, USA, 2555--2560. https://doi.org/10.1145/2212776.2223835Google ScholarDigital Library
- Walter S. Lasecki, Mitchell Gordon, Danai Koutra, Malte F. Jung, Steven P. Dow, and Jeffrey P. Bigham. 2014. Glance: Rapidly Coding Behavioral Video with the Crowd. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (UIST '14). ACM, New York, NY, USA, 551--562. https://doi.org/10.1145/2642918.2647367Google ScholarDigital Library
- John Le, Andy Edmonds, Vaughn Hester, and Lukas Biewald. 2010. Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution. In SIGIR 2010 workshop on crowdsourcing for search evaluation, Vol. 2126. ACM, New York, NY, USA.Google Scholar
- Sang Won Lee, Rebecca Krosnick, Sun Young Park, Brandon Keelean, Sach Vaidya, Stephanie D O'Keefe, and Walter S Lasecki. 2018. Exploring real-time collaboration in crowd-powered systems through a ui design tool. In Proceedings of the Conference on Computer-Supported Cooperative Work & Social Computing (CSCW'18). ACM, New York, NY, USA. https://doi.org/10.1145/3274373Google ScholarDigital Library
- Christopher H Lin, Mausam Mausam, and Daniel S Weld. 2012. Dynamically Switching between Synergistic Workflows for Crowdsourcing. In Twenty-Sixth AAAI Conference on Artificial Intelligence .Google Scholar
- Donald G. Mackay and Thomas G. Bever. 1967. In search of ambiguity. Perception & Psychophysics, Vol. 2, 5 (01 May 1967), 193--200. https://doi.org/10.3758/BF03213049Google Scholar
- Winter Mason and Duncan J Watts. 2009. Financial incentives and the performance of crowds. In Proceedings of the ACM SIGKDD workshop on human computation. ACM, New York, NY, USA, 77--85. https://doi.org/10.1145/1809400.1809422Google ScholarDigital Library
- Tyler McDonnell, Matthew Lease, Mucahid Kutlu, and Tamer Elsayed. 2016. Why is that relevant? Collecting annotator rationales for relevance judgments. In Fourth AAAI Conference on Human Computation and Crowdsourcing. AAAI, Palo Alto, CA, USA.Google Scholar
- Charles Kay Ogden and Ivor Armstrong Richards. 1923. The Meaning of Meaning: A Study of the Influence of Language upon Thought and of the Science of Symbolism. Vol. 29. K. Paul, Trench, Trubner & Company, Limited.Google Scholar
- David Oleson, Alexander Sorokin, Greg P Laughlin, Vaughn Hester, John Le, and Lukas Biewald. 2011. Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing. Human computation, Vol. 11, 11 (2011).Google ScholarDigital Library
- Ingram Olkin and Friedrich Pukelsheim. 1982. The distance between two random vectors with given dispersion matrices. Linear Algebra Appl., Vol. 48 (1982), 257 -- 263. https://doi.org/10.1016/0024--3795(82)90112--4Google ScholarCross Ref
- Barbara Plank, Dirk Hovy, and Anders Søgaard. 2014. Linguistically debatable or just plain wrong?. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (volume 2: Short Papers), Vol. 2. ACL, Baltimore, MD, USA, 507--511. https://doi.org/10.3115/v1/P14--2083Google ScholarCross Ref
- Dravzen Prelec. 2004. A Bayesian Truth Serum for Subjective Data. Science, Vol. 306, 5695 (2004), 462--466. https://doi.org/10.1126/science.1102081Google Scholar
- Dravz en Prelec, H Sebastian Seung, and John McCoy. 2017. A solution to the single-question crowd wisdom problem. Nature, Vol. 541, 7638 (2017), 532.Google Scholar
- Mike Schaekermann, Joslin Goh, Kate Larson, and Edith Law. 2018. Resolvable vs. Irresolvable Disagreement: A Study on Worker Deliberation in Crowd Work. Proc. ACM Hum.-Comput. Interact., Vol. 2, CSCW, Article 154 (Nov. 2018), bibinfonumpages19 pages. https://doi.org/10.1145/3274423Google Scholar
- Shilad Sen, Margaret E. Giesel, Rebecca Gold, Benjamin Hillmann, Matt Lesicko, Samuel Naden, Jesse Russell, Zixiao (Ken) Wang, and Brent Hecht. 2015. Turkers, Scholars, "Arafat" and "Peace": Cultural Communities and Algorithmic Gold Standards. In Proceedings of the ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW'15). ACM, New York, NY, USA, 826--838. https://doi.org/10.1145/2675133.2675285Google ScholarDigital Library
- Pao Siangliulue, Kenneth C Arnold, Krzysztof Z Gajos, and Steven P Dow. 2015. Toward collaborative ideation at scale: Leveraging ideas from others to generate more creative and diverse ideas. In Proceedings of the ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW'15). ACM, New York, NY, USA, 937--945. https://doi.org/10.1145/2675133.2675239Google ScholarDigital Library
- Pao Siangliulue, Joel Chan, Krzysztof Z Gajos, and Steven P Dow. 2015. Providing timely examples improves the quantity and quality of generated ideas. In Proceedings of the Conference on Creativity and Cognition (C&C'15). ACM, New York, NY, USA, 83--92. https://doi.org/10.1145/2757226.2757230Google ScholarDigital Library
- Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Y Ng. 2008. Cheap and fast--but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the conference on empirical methods in natural language processing. ACL, Baltimore, MD, USA, 254--263.Google ScholarCross Ref
- Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, Washington, USA, 1631--1642.Google Scholar
- Jean Y Song, Raymond Fok, Juho Kim, and Walter S Lasecki. 2019. FourEyes: Leveraging Tool Diversity as a Means to Improve Aggregate Accuracy in Crowdsourcing. ACM Transactions on Interactive Intelligent Systems (TiiS), Vol. 19, 1 (2019), Article No.3. https://doi.org/10.1145/3237188Google Scholar
- Jean Y Song, Raymond Fok, Alan Lundgard, Fan Yang, Juho Kim, and Walter S Lasecki. 2018. Two tools are better than one: Tool diversity as a means of improving aggregate crowd performance. In 23rd International Conference on Intelligent User Interfaces (IUI'18). ACM, New York, NY, USA, 559--570. https://doi.org/10.1145/3172944.3172948Google ScholarDigital Library
- Jean Y Song, Stephan J Lemmer, Michael Xieyang Liu, Shiyan Yan, Juho Kim, Jason J Corso, and Walter S Lasecki. 2019 b. Popup: reconstructing 3D video using particle filtering to aggregate crowd responses. In Proceedings of the 24th International Conference on Intelligent User Interfaces (IUI'19). ACM, New York, NY, USA, 558--569. https://doi.org/10.1145/3301275.3302305Google ScholarDigital Library
- Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta. 2017. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE international conference on computer vision. IEEE, Piscataway, New Jersey, US, 843--852. https://doi.org/10.1109/ICCV.2017.97Google ScholarCross Ref
- Jaime Teevan and Lisa Yu. 2017. Bringing the wisdom of the crowd to an individual by having the individual assume different roles. In Proceedings of the Conference on Creativity and Cognition (C&C'17). ACM, New York, NY, USA, 131--135. https://doi.org/10.1145/3059454.3059467Google ScholarDigital Library
- Petros Venetis, Hector Garcia-Molina, Kerui Huang, and Neoklis Polyzotis. 2012. Max algorithms in crowdsourcing environments. In Proceedings of the 21st international conference on World Wide Web. ACM, New York, NY, USA, 989--998. https://doi.org/10.1145/2187836.2187969Google ScholarDigital Library
- Edward Vul and Harold Pashler. 2008. Measuring the Crowd Within: Probabilistic Representations Within Individuals. Psychological Science, Vol. 19, 7 (2008), 645--647. https://doi.org/10.1111/j.1467--9280.2008.02136.xGoogle ScholarCross Ref
- William Yang Wang, Dan Bohus, Ece Kamar, and Eric Horvitz. 2012. Crowdsourcing the acquisition of natural language corpora: Methods and observations. In 2012 IEEE Spoken Language Technology Workshop (SLT). IEEE, Piscataway, New Jersey, US, 73--78. https://doi.org/10.1109/SLT.2012.6424200Google ScholarCross Ref
- Chun-Ju Yang, Kristen Grauman, and Danna Gurari. 2018. Visual Question Answer Diversity. In Sixth AAAI Conference on Human Computation and Crowdsourcing. AAAI, Palo Alto, CA, USA.Google Scholar
- Biqiao Zhang, Georg Essl, and Emily Mower Provost. 2017. Predicting the Distribution of Emotion Perception: Capturing Inter-rater Variability. In Proceedings of the 19th ACM International Conference on Multimodal Interaction (ICMI '17). ACM, New York, NY, USA, 51--59. https://doi.org/10.1145/3136755.3136792Google ScholarDigital Library
Index Terms
- Efficient Elicitation Approaches to Estimate Collective Crowd Answers
Recommendations
Learning Representations for Sparse Crowd Answers
Neural Information ProcessingAbstractWhen collecting answers from crowds, if there are many instances, each worker can only provide the answers to a small subset of the instances, and the instance-worker answer matrix is thus sparse. The solutions for improving the quality of crowd ...
Agreement/disagreement based crowd labeling
In many supervised learning problems, determining the true labels of training instances is expensive, laborious, and even practically impossible. As an alternative approach, it is much easier to collect multiple subjective (possibly noisy) labels from ...
Modeling annotator behaviors for crowd labeling
Machine learning applications can benefit greatly from vast amounts of data, provided that reliable labels are available. Mobilizing crowds to annotate the unlabeled data is a common solution. Although the labels provided by the crowd are subjective and ...
Comments