research-article

Efficient Elicitation Approaches to Estimate Collective Crowd Answers

Authors:
John Joon Young Chung

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

,
Jean Y. Song

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

,
Sindhu Kutty

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

,
Sungsoo (Ray) Hong

New York University, New York City, NY, USA

New York University, New York City, NY, USA
View Profile

,
Juho Kim

Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea

Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
View Profile

,
Walter S. Lasecki

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

Proceedings of the ACM on Human-Computer Interaction Volume 3 Issue CSCWArticle No.: 62pp 1–25https://doi.org/10.1145/3359164

Published:07 November 2019Publication History

Proceedings of the ACM on Human-Computer Interaction

Abstract

When crowdsourcing the creation of machine learning datasets, statistical distributions that capture diverse answers can represent ambiguous data better than a single best answer. Unfortunately, collecting distributions is expensive because a large number of responses need to be collected to form a stable distribution. Despite this, the efficient collection of answer distributions-that is, ways to use less human effort to collect estimates of the eventual distribution that would be formed by a large group of responses-is an under-studied topic. In this paper, we demonstrate that this type of estimation is possible and characterize different elicitation approaches to guide the development of future systems. We investigate eight elicitation approaches along two dimensions: annotation granularity and estimation perspective. Annotation granularity is varied by annotating i) a single "best" label, ii) all relevant labels, iii) a ranking of all relevant labels, or iv) real-valued weights for all relevant labels. Estimation perspective is varied by prompting workers to either respond with their own answer or an estimate of the answer(s) that they expect other workers would provide. Our study collected ordinal annotations on the emotional valence of facial images from 1,960 crowd workers and found that, surprisingly, the most fine-grained elicitation methods were not the most accurate, despite workers spending more time to provide answers. Instead, the most efficient approach was to ask workers to choose all relevant classes that others would have selected. This resulted in a 21.4% reduction in the human time required to reach the same performance as the baseline (i.e., selecting a single answer with their own perspective). By analyzing cases in which finer-grained annotations degraded performance, we contribute to a better understanding of the trade-offs between answer elicitation approaches. Our work makes it more tractable to use answer distributions in large-scale tasks such as ML training, and aims to spark future work on techniques that can efficiently estimate answer distributions.

References

Lora Aroyo and Chris Welty. 2015. Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation. AI Magazine, Vol. 36, 1 (2015), 15--24. https://doi.org/10.1609/aimag.v36i1.2564Google ScholarDigital Library
Alexandry Augustin, Matteo Venanzi, Alex Rogers, and Nicholas R. Jennings. 2017. Bayesian Aggregation of Categorical Distributions with Applications in Crowdsourcing. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI'17). AAAI Press, Menlo Park, CA, USA, 1411--1417. https://doi.org/10.24963/ijcai.2017/195Google Scholar
Arkar Min Aung and Jacob Whitehill. 2018. Harnessing Label Uncertainty to Improve Modeling: An Application to Student Engagement Recognition. In IEEE International Conference on Automatic Face Gesture Recognition (FG'18). IEEE, Piscataway, New Jersey, US, 166--170. https://doi.org/10.1109/FG.2018.00033Google ScholarCross Ref
Kirsten Boehner, Rogério DePaula, Paul Dourish, and Phoebe Sengers. 2007. How emotion is made and measured. International Journal of Human-Computer Studies, Vol. 65, 4 (2007), 275 -- 291. https://doi.org/10.1016/j.ijhcs.2006.11.016Google ScholarDigital Library
Jonathan Bragg, Daniel S Weld, et almbox. 2013. Crowdsourcing multi-label classification for taxonomy creation. In First AAAI conference on human computation and crowdsourcing. AAAI, Palo Alto, CA, USA.Google Scholar
Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017. Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'17). ACM, New York, NY, USA, 2334--2346. https://doi.org/10.1145/3025453.3026044Google ScholarDigital Library
Quanze Chen, Jonathan Bragg, Lydia B. Chilton, and Dan S. Weld. 2019. Cicero: Multi-Turn, Contextual Argumentation for Accurate Crowdsourcing. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'19). ACM, New York, NY, USA, Paper No. 531. https://doi.org/10.1145/3290605.3300761Google Scholar
Lydia B Chilton, Greg Little, Darren Edge, Daniel S Weld, and James A Landay. 2013. Cascade: Crowdsourcing taxonomy creation. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'13). ACM, New York, NY, USA, 1999--2008. https://doi.org/10.1145/2470654.2466265Google ScholarDigital Library
Minsuk Choi, Cheonbok Park, Soyoung Yang, Yonggyu Kim, Jaegul Choo, and Sungsoo Ray Hong. 2019. AILA: Attentive Interactive Labeling Assistant for Document Classification through Attention-Based Deep Neural Networks. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'19). ACM, New York, NY, USA, Paper No. 230. https://doi.org/10.1145/3290605.3300460Google ScholarDigital Library
Andrew M. Colman, Claire E. Norris, and Carolyn C. Preston. 1997. Comparing Rating Scales of Different Lengths: Equivalence of Scores from 5-Point and 7-Point Scales. Psychological Reports, Vol. 80, 2 (1997), 355--362. https://doi.org/10.2466/pr0.1997.80.2.355Google ScholarCross Ref
John Dawes. 2008. Do Data Characteristics Change According to the Number of Scale Points Used? An Experiment Using 5-Point, 7-Point and 10-Point Scales. International Journal of Market Research, Vol. 50, 1 (2008), 61--104. https://doi.org/10.1177/147078530805000106Google ScholarCross Ref
Adeline Delavande and Susann Rohwedder. 2008. Eliciting subjective probabilities in Internet surveys. Public Opinion Quarterly, Vol. 72, 5 (2008), 866--891. https://doi.org/10.1093/poq/nfn062Google ScholarCross Ref
Don A Dillman, Jolene D Smyth, and Leah Melani Christian. 2014. Internet, phone, mail, and mixed-mode surveys: the tailored design method .John Wiley & Sons.Google ScholarDigital Library
Shayan Doroudi, Ece Kamar, Emma Brunskill, and Eric Horvitz. 2016. Toward a Learning Science for Complex Crowdsourcing Tasks. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA, 2623--2634. https://doi.org/10.1145/2858036.2858268Google ScholarDigital Library
Anca Dumitrache. 2015. Crowdsourcing disagreement for collecting semantic annotation. In European Semantic Web Conference. Springer, 701--710. https://doi.org/10.1007/978--3--319--18818--8_43Google ScholarCross Ref
Anca Dumitrache, Lora Aroyo, and Chris Welty. 2018a. Capturing Ambiguity in Crowdsourcing Frame Disambiguation. In Proceedings of the Sixth AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2018, Zü rich, Switzerland, July 5--8, 2018. AAAI, Palo Alto, CA, USA, 12--20.Google Scholar
Anca Dumitrache, Lora Aroyo, and Chris Welty. 2018b. Crowdsourcing Ground Truth for Medical Relation Extraction. ACM Trans. Interact. Intell. Syst., Vol. 8, 2 (2018), 11:1--11:20. https://doi.org/10.1145/3152889Google ScholarDigital Library
Anca Dumitrache, Oana Inel, Lora Aroyo, Benjamin Timmermans, and Chris Welty. 2018c. CrowdTruth 2.0: Quality Metrics for Crowdsourcing with Disagreement. arXiv preprint arXiv:1808.06080 (2018).Google Scholar
Natalie C Ebner, Michaela Riediger, and Ulman Lindenberger. 2010. FACES--A database of facial expressions in young, middle-aged, and older women and men: Development and validation. Behavior research methods, Vol. 42, 1 (2010), 351--362. https://doi.org/10.3758/BRM.42.1.351Google Scholar
Bradley Efron and Robert J Tibshirani. 1994. An introduction to the bootstrap .CRC press.Google Scholar
Hillary Anger Elfenbein and Nalini Ambady. 2002. On the universality and cultural specificity of emotion recognition: a meta-analysis. Psychological bulletin, Vol. 128, 2 (2002), 203.Google Scholar
Nicholas Epley and Eugene M Caruso. 2008. Perspective Taking: Misstepping Into Others' Shoes. Handbook of imagination and mental simulation (2008), 295.Google Scholar
Nicholas Epley, Boaz Keysar, Leaf Van Boven, and Thomas Gilovich. 2004. Perspective taking as egocentric anchoring and adjustment. Journal of personality and social psychology, Vol. 87, 3 (2004), 327. https://doi.org/10.1037/0022--3514.87.3.327Google ScholarCross Ref
Bin-Bin Gao, Chao Xing, Chen-Wei Xie, Jianxin Wu, and Xin Geng. 2017. Deep Label Distribution Learning With Label Ambiguity. IEEE Transactions on Image Processing, Vol. 26, 6 (2017), 2825--2838. https://doi.org/10.1109/TIP.2017.2689998Google ScholarDigital Library
Xin Geng. 2016. Label Distribution Learning. IEEE Transactions on Knowledge and Data Engineering, Vol. 28, 7 (2016), 1734--1748. https://doi.org/10.1109/TKDE.2016.2545658Google ScholarCross Ref
Xin Geng, Chao C. Yin, and Zhi-Hua Z. Zhou. 2013. Facial Age Estimation by Learning from Label Distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, 10 (2013), 2401--2412. https://doi.org/10.1109/TPAMI.2013.51Google ScholarDigital Library
Victor Girotto, Erin Walker, and Winslow Burleson. 2019. CrowdMuse: Supporting Crowd Idea Generation through User Modeling and Adaptation. In Proceedings of the Conference on Creativity and Cognition (C&C'19). ACM, New York, NY, USA, 95--106. https://doi.org/10.1145/3325480.3325497Google ScholarDigital Library
Daniel G Goldstein and David Rothschild. 2014. Lay understanding of probability distributions. Judgment & Decision Making, Vol. 9, 1 (2014).Google Scholar
D. G. Gordon and T. D. Breaux. 2014. The role of legal expertise in interpretation of legal requirements and definitions. In 2014 IEEE 22nd International Requirements Engineering Conference (RE). IEEE, Piscataway, New Jersey, US, 273--282. https://doi.org/10.1109/RE.2014.6912269Google ScholarCross Ref
Paul Grau, Babak Naderi, and Juho Kim. 2018. Personalized Motivation-supportive Messages for Increasing Participation in Crowd-civic Systems. Proceedings of the ACM on Human-Computer Interaction, Vol. 2, CSCW (2018), 60. https://doi.org/10.1145/3274329Google ScholarDigital Library
Thomas L. Griffiths, Charles Kemp, and Joshua B. Tenenbaum. 2008. Bayesian Models of Cognition .Cambridge University Press, 59--100. https://doi.org/10.1017/CBO9780511816772.006Google Scholar
Danna Gurari and Kristen Grauman. 2017. CrowdVerge: Predicting If People Will Agree on the Answer to a Visual Question. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'17). ACM, New York, NY, USA, 3511--3522. https://doi.org/10.1145/3025453.3025781Google ScholarDigital Library
Danna Gurari, Yinan Zhao, Suyog Dutt Jain, Margrit Betke, and Kristen Grauman. 2019. Predicting How to Distribute Work Between Algorithms and Humans to Segment an Image Batch. International Journal of Computer Vision (2019), 1--19.Google Scholar
Sungsoo Ray Hong, Minhyang Mia Suh, Nathalie Henry Riche, Jooyoung Lee, Juho Kim, and Mark Zachry. 2018. Collaborative Dynamic Queries: Supporting Distributed Small Group Decision-making. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'18). ACM, New York, NY, USA, Paper No. 66. https://doi.org/10.1145/3173574.3173640Google ScholarDigital Library
Sungsoo Ray Hong, Minhyang Mia Suh, Tae Soo Kim, Irian Smoke, Sangwha Sien, Janet Ng, Mark Zachry, and Juho Kim. 2019. Design for Collaborative Information-Seeking: Understanding User Challenges and Deploying Collaborative Dynamic Queries. Proceedings of the ACM on Human-Computer Interaction, Vol. 3 (2019), Article 106. https://doi.org/10.1145/3359208Google ScholarDigital Library
Shih-Wen Huang and Wai-Tat Fu. 2013. Enhancing Reliability Using Peer Consistency Evaluation in Human Computation. In Proceedings of the ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW'13). ACM, New York, NY, USA, 639--648. https://doi.org/10.1145/2441776.2441847Google ScholarDigital Library
Mohit Iyyer, Peter Enns, Jordan Boyd-Graber, and Philip Resnik. 2014. Political Ideology Detection Using Recursive Neural Networks. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, Maryland, 1113--1122. https://doi.org/10.3115/v1/P14--1105Google ScholarCross Ref
Irving L Janis. 1971. Groupthink. Psychology today, Vol. 5, 6 (1971), 43--46.Google Scholar
Alan Jern, Kai-Min K Chang, and Charles Kemp. 2014. Belief polarization is not always irrational. Psychological review, Vol. 121, 2 (2014), 206. https://doi.org/10.1037/a0035941Google Scholar
Alan Jern, Kai min Chang, and Charles Kemp. 2009. Bayesian belief polarization. In Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta (Eds.). Curran Associates, Inc., 853--861. http://papers.nips.cc/paper/3725-bayesian-belief-polarization.pdfGoogle Scholar
Xiuyi Jia, Weiwei Li, Junyu Liu, and Yu Zhang. 2018. Label Distribution Learning by Exploiting Label Correlations. In Thirty-Second AAAI Conference on Artificial Intelligence. AAAI, Palo Alto, CA, USA.Google Scholar
Youxuan Jiang, Catherine Finegan-Dollak, Jonathan K. Kummerfeld, and Walter Lasecki. 2018. Effective Crowdsourcing for a New Type of Summarization Task. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 628--633. https://doi.org/httts://doi.org/10.18653/v1/N18--2099Google ScholarCross Ref
Youxuan Jiang, Jonathan K. Kummerfeld, and Walter S. Lasecki. 2017. Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Vancouver, Canada, 103--109. https://doi.org/10.18653/v1/P17--2017Google Scholar
Rob Johns. 2010. Likert items and scales. Survey Question Bank: Methods Fact Sheet, Vol. 1 (2010), 1--11.Google Scholar
Eva Jonas, Stefan Schulz-Hardt, Dieter Frey, and Norman Thelen. 2001. Confirmation bias in sequential information search after preliminary decisions: an expansion of dissonance theoretical research on selective exposure to information. Journal of personality and social psychology, Vol. 80, 4 (2001), 557.Google ScholarCross Ref
David Jurgens. 2013. Embracing Ambiguity: A Comparison of Annotation Methodologies for Crowdsourcing Word Sense Labels. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Atlanta, Georgia, USA, 556--562.Google Scholar
Sanjay Kairam and Jeffrey Heer. 2016. Parting Crowds: Characterizing Divergent Interpretations in Crowdsourced Annotation Tasks. In Proceedings of the ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW'16). ACM, New York, NY, USA, 1637--1648. https://doi.org/10.1145/2818048.2820016Google ScholarDigital Library
Matthew Kay, Tara Kola, Jessica R. Hullman, and Sean A. Munson. 2016. When (Ish) is My Bus?: User-centered Visualizations of Uncertainty in Everyday, Mobile Predictive Systems. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA, 5092--5103. https://doi.org/10.1145/2858036.2858558Google Scholar
Thomas Kelly. 2008. Disagreement, dogmatism, and belief polarization. The Journal of Philosophy, Vol. 105, 10 (2008), 611--633.Google ScholarCross Ref
Hyunwoo Kim, Eun-Young Ko, Donghoon Han, Sung-Chul Lee, Simon T Perrault, Jihee Kim, and Juho Kim. 2019. Crowdsourcing Perspectives on Public Policy from Stakeholders. In Extended Abstracts of the ACM Conference on Human Factors in Computing Systems (CHI'19). ACM, New York, NY, USA, LBW1220. https://doi.org/10.1145/3290607.3312769Google ScholarDigital Library
Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing User Studies with Mechanical Turk. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'08). ACM, New York, NY, USA, 453--456. https://doi.org/10.1145/1357054.1357127Google ScholarDigital Library
Solomon Kullback and Richard A Leibler. 1951. On information and sufficiency. The annals of mathematical statistics, Vol. 22, 1 (1951), 79--86. http://www.jstor.org/stable/2236703Google Scholar
Walter Lasecki and Jeffrey Bigham. 2012. Self-correcting crowds. In Extended Abstracts of the ACM conference on Human Factors in Computing Systems (CHI'12). ACM, New York, NY, USA, 2555--2560. https://doi.org/10.1145/2212776.2223835Google ScholarDigital Library
Walter S. Lasecki, Mitchell Gordon, Danai Koutra, Malte F. Jung, Steven P. Dow, and Jeffrey P. Bigham. 2014. Glance: Rapidly Coding Behavioral Video with the Crowd. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (UIST '14). ACM, New York, NY, USA, 551--562. https://doi.org/10.1145/2642918.2647367Google ScholarDigital Library
John Le, Andy Edmonds, Vaughn Hester, and Lukas Biewald. 2010. Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution. In SIGIR 2010 workshop on crowdsourcing for search evaluation, Vol. 2126. ACM, New York, NY, USA.Google Scholar
Sang Won Lee, Rebecca Krosnick, Sun Young Park, Brandon Keelean, Sach Vaidya, Stephanie D O'Keefe, and Walter S Lasecki. 2018. Exploring real-time collaboration in crowd-powered systems through a ui design tool. In Proceedings of the Conference on Computer-Supported Cooperative Work & Social Computing (CSCW'18). ACM, New York, NY, USA. https://doi.org/10.1145/3274373Google ScholarDigital Library
Christopher H Lin, Mausam Mausam, and Daniel S Weld. 2012. Dynamically Switching between Synergistic Workflows for Crowdsourcing. In Twenty-Sixth AAAI Conference on Artificial Intelligence .Google Scholar
Donald G. Mackay and Thomas G. Bever. 1967. In search of ambiguity. Perception & Psychophysics, Vol. 2, 5 (01 May 1967), 193--200. https://doi.org/10.3758/BF03213049Google Scholar
Winter Mason and Duncan J Watts. 2009. Financial incentives and the performance of crowds. In Proceedings of the ACM SIGKDD workshop on human computation. ACM, New York, NY, USA, 77--85. https://doi.org/10.1145/1809400.1809422Google ScholarDigital Library
Tyler McDonnell, Matthew Lease, Mucahid Kutlu, and Tamer Elsayed. 2016. Why is that relevant? Collecting annotator rationales for relevance judgments. In Fourth AAAI Conference on Human Computation and Crowdsourcing. AAAI, Palo Alto, CA, USA.Google Scholar
Charles Kay Ogden and Ivor Armstrong Richards. 1923. The Meaning of Meaning: A Study of the Influence of Language upon Thought and of the Science of Symbolism. Vol. 29. K. Paul, Trench, Trubner & Company, Limited.Google Scholar
David Oleson, Alexander Sorokin, Greg P Laughlin, Vaughn Hester, John Le, and Lukas Biewald. 2011. Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing. Human computation, Vol. 11, 11 (2011).Google ScholarDigital Library
Ingram Olkin and Friedrich Pukelsheim. 1982. The distance between two random vectors with given dispersion matrices. Linear Algebra Appl., Vol. 48 (1982), 257 -- 263. https://doi.org/10.1016/0024--3795(82)90112--4Google ScholarCross Ref
Barbara Plank, Dirk Hovy, and Anders Søgaard. 2014. Linguistically debatable or just plain wrong?. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (volume 2: Short Papers), Vol. 2. ACL, Baltimore, MD, USA, 507--511. https://doi.org/10.3115/v1/P14--2083Google ScholarCross Ref
Dravzen Prelec. 2004. A Bayesian Truth Serum for Subjective Data. Science, Vol. 306, 5695 (2004), 462--466. https://doi.org/10.1126/science.1102081Google Scholar
Dravz en Prelec, H Sebastian Seung, and John McCoy. 2017. A solution to the single-question crowd wisdom problem. Nature, Vol. 541, 7638 (2017), 532.Google Scholar
Mike Schaekermann, Joslin Goh, Kate Larson, and Edith Law. 2018. Resolvable vs. Irresolvable Disagreement: A Study on Worker Deliberation in Crowd Work. Proc. ACM Hum.-Comput. Interact., Vol. 2, CSCW, Article 154 (Nov. 2018), bibinfonumpages19 pages. https://doi.org/10.1145/3274423Google Scholar
Shilad Sen, Margaret E. Giesel, Rebecca Gold, Benjamin Hillmann, Matt Lesicko, Samuel Naden, Jesse Russell, Zixiao (Ken) Wang, and Brent Hecht. 2015. Turkers, Scholars, "Arafat" and "Peace": Cultural Communities and Algorithmic Gold Standards. In Proceedings of the ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW'15). ACM, New York, NY, USA, 826--838. https://doi.org/10.1145/2675133.2675285Google ScholarDigital Library
Pao Siangliulue, Kenneth C Arnold, Krzysztof Z Gajos, and Steven P Dow. 2015. Toward collaborative ideation at scale: Leveraging ideas from others to generate more creative and diverse ideas. In Proceedings of the ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW'15). ACM, New York, NY, USA, 937--945. https://doi.org/10.1145/2675133.2675239Google ScholarDigital Library
Pao Siangliulue, Joel Chan, Krzysztof Z Gajos, and Steven P Dow. 2015. Providing timely examples improves the quantity and quality of generated ideas. In Proceedings of the Conference on Creativity and Cognition (C&C'15). ACM, New York, NY, USA, 83--92. https://doi.org/10.1145/2757226.2757230Google ScholarDigital Library
Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Y Ng. 2008. Cheap and fast--but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the conference on empirical methods in natural language processing. ACL, Baltimore, MD, USA, 254--263.Google ScholarCross Ref
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, Washington, USA, 1631--1642.Google Scholar
Jean Y Song, Raymond Fok, Juho Kim, and Walter S Lasecki. 2019. FourEyes: Leveraging Tool Diversity as a Means to Improve Aggregate Accuracy in Crowdsourcing. ACM Transactions on Interactive Intelligent Systems (TiiS), Vol. 19, 1 (2019), Article No.3. https://doi.org/10.1145/3237188Google Scholar
Jean Y Song, Raymond Fok, Alan Lundgard, Fan Yang, Juho Kim, and Walter S Lasecki. 2018. Two tools are better than one: Tool diversity as a means of improving aggregate crowd performance. In 23rd International Conference on Intelligent User Interfaces (IUI'18). ACM, New York, NY, USA, 559--570. https://doi.org/10.1145/3172944.3172948Google ScholarDigital Library
Jean Y Song, Stephan J Lemmer, Michael Xieyang Liu, Shiyan Yan, Juho Kim, Jason J Corso, and Walter S Lasecki. 2019 b. Popup: reconstructing 3D video using particle filtering to aggregate crowd responses. In Proceedings of the 24th International Conference on Intelligent User Interfaces (IUI'19). ACM, New York, NY, USA, 558--569. https://doi.org/10.1145/3301275.3302305Google ScholarDigital Library
Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta. 2017. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE international conference on computer vision. IEEE, Piscataway, New Jersey, US, 843--852. https://doi.org/10.1109/ICCV.2017.97Google ScholarCross Ref
Jaime Teevan and Lisa Yu. 2017. Bringing the wisdom of the crowd to an individual by having the individual assume different roles. In Proceedings of the Conference on Creativity and Cognition (C&C'17). ACM, New York, NY, USA, 131--135. https://doi.org/10.1145/3059454.3059467Google ScholarDigital Library
Petros Venetis, Hector Garcia-Molina, Kerui Huang, and Neoklis Polyzotis. 2012. Max algorithms in crowdsourcing environments. In Proceedings of the 21st international conference on World Wide Web. ACM, New York, NY, USA, 989--998. https://doi.org/10.1145/2187836.2187969Google ScholarDigital Library
Edward Vul and Harold Pashler. 2008. Measuring the Crowd Within: Probabilistic Representations Within Individuals. Psychological Science, Vol. 19, 7 (2008), 645--647. https://doi.org/10.1111/j.1467--9280.2008.02136.xGoogle ScholarCross Ref
William Yang Wang, Dan Bohus, Ece Kamar, and Eric Horvitz. 2012. Crowdsourcing the acquisition of natural language corpora: Methods and observations. In 2012 IEEE Spoken Language Technology Workshop (SLT). IEEE, Piscataway, New Jersey, US, 73--78. https://doi.org/10.1109/SLT.2012.6424200Google ScholarCross Ref
Chun-Ju Yang, Kristen Grauman, and Danna Gurari. 2018. Visual Question Answer Diversity. In Sixth AAAI Conference on Human Computation and Crowdsourcing. AAAI, Palo Alto, CA, USA.Google Scholar
Biqiao Zhang, Georg Essl, and Emily Mower Provost. 2017. Predicting the Distribution of Emotion Perception: Capturing Inter-rater Variability. In Proceedings of the 19th ACM International Conference on Multimodal Interaction (ICMI '17). ACM, New York, NY, USA, 51--59. https://doi.org/10.1145/3136755.3136792Google ScholarDigital Library

Index Terms

Efficient Elicitation Approaches to Estimate Collective Crowd Answers
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Collaborative interaction

Recommendations

Learning Representations for Sparse Crowd Answers
Neural Information Processing
Abstract
When collecting answers from crowds, if there are many instances, each worker can only provide the answers to a small subset of the instances, and the instance-worker answer matrix is thus sparse. The solutions for improving the quality of crowd ...
Read More
Agreement/disagreement based crowd labeling

In many supervised learning problems, determining the true labels of training instances is expensive, laborious, and even practically impossible. As an alternative approach, it is much easier to collect multiple subjective (possibly noisy) labels from ...
Read More
Modeling annotator behaviors for crowd labeling

Machine learning applications can benefit greatly from vast amounts of data, provided that reliable labels are available. Mobilizing crowds to annotate the unlabeled data is a common solution. Although the labels provided by the crowd are subjective and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the ACM on Human-Computer Interaction Volume 3, Issue CSCW
November 2019
5026 pages
EISSN:2573-0142
DOI:10.1145/3371885
Editors:
Airi Lampinen
Stockholm University, Sweden
,
Darren Gergle
Northwestern University, USA
,
David A. Shamma
FXPAL, USA
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 November 2019
Published in pacmhci Volume 3, Issue CSCW

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ambiguity
annotation
answer distributions
crowdsourcing
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 32
  Total Citations
  View Citations
- 496
  Total Downloads
- Downloads (Last 12 months)71
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient Elicitation Approaches to Estimate Collective Crowd Answers

Proceedings of the ACM on Human-Computer Interaction

Abstract

References

Cited By

Index Terms

Recommendations

Learning Representations for Sparse Crowd Answers

Agreement/disagreement based crowd labeling

Modeling annotator behaviors for crowd labeling