skip to main content
research-article

Efficient Elicitation Approaches to Estimate Collective Crowd Answers

Published:07 November 2019Publication History
Skip Abstract Section

Abstract

When crowdsourcing the creation of machine learning datasets, statistical distributions that capture diverse answers can represent ambiguous data better than a single best answer. Unfortunately, collecting distributions is expensive because a large number of responses need to be collected to form a stable distribution. Despite this, the efficient collection of answer distributions-that is, ways to use less human effort to collect estimates of the eventual distribution that would be formed by a large group of responses-is an under-studied topic. In this paper, we demonstrate that this type of estimation is possible and characterize different elicitation approaches to guide the development of future systems. We investigate eight elicitation approaches along two dimensions: annotation granularity and estimation perspective. Annotation granularity is varied by annotating i) a single "best" label, ii) all relevant labels, iii) a ranking of all relevant labels, or iv) real-valued weights for all relevant labels. Estimation perspective is varied by prompting workers to either respond with their own answer or an estimate of the answer(s) that they expect other workers would provide. Our study collected ordinal annotations on the emotional valence of facial images from 1,960 crowd workers and found that, surprisingly, the most fine-grained elicitation methods were not the most accurate, despite workers spending more time to provide answers. Instead, the most efficient approach was to ask workers to choose all relevant classes that others would have selected. This resulted in a 21.4% reduction in the human time required to reach the same performance as the baseline (i.e., selecting a single answer with their own perspective). By analyzing cases in which finer-grained annotations degraded performance, we contribute to a better understanding of the trade-offs between answer elicitation approaches. Our work makes it more tractable to use answer distributions in large-scale tasks such as ML training, and aims to spark future work on techniques that can efficiently estimate answer distributions.

References

  1. Lora Aroyo and Chris Welty. 2015. Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation. AI Magazine, Vol. 36, 1 (2015), 15--24. https://doi.org/10.1609/aimag.v36i1.2564Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alexandry Augustin, Matteo Venanzi, Alex Rogers, and Nicholas R. Jennings. 2017. Bayesian Aggregation of Categorical Distributions with Applications in Crowdsourcing. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI'17). AAAI Press, Menlo Park, CA, USA, 1411--1417. https://doi.org/10.24963/ijcai.2017/195Google ScholarGoogle Scholar
  3. Arkar Min Aung and Jacob Whitehill. 2018. Harnessing Label Uncertainty to Improve Modeling: An Application to Student Engagement Recognition. In IEEE International Conference on Automatic Face Gesture Recognition (FG'18). IEEE, Piscataway, New Jersey, US, 166--170. https://doi.org/10.1109/FG.2018.00033Google ScholarGoogle ScholarCross RefCross Ref
  4. Kirsten Boehner, Rogério DePaula, Paul Dourish, and Phoebe Sengers. 2007. How emotion is made and measured. International Journal of Human-Computer Studies, Vol. 65, 4 (2007), 275 -- 291. https://doi.org/10.1016/j.ijhcs.2006.11.016Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jonathan Bragg, Daniel S Weld, et almbox. 2013. Crowdsourcing multi-label classification for taxonomy creation. In First AAAI conference on human computation and crowdsourcing. AAAI, Palo Alto, CA, USA.Google ScholarGoogle Scholar
  6. Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017. Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'17). ACM, New York, NY, USA, 2334--2346. https://doi.org/10.1145/3025453.3026044Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Quanze Chen, Jonathan Bragg, Lydia B. Chilton, and Dan S. Weld. 2019. Cicero: Multi-Turn, Contextual Argumentation for Accurate Crowdsourcing. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'19). ACM, New York, NY, USA, Paper No. 531. https://doi.org/10.1145/3290605.3300761Google ScholarGoogle Scholar
  8. Lydia B Chilton, Greg Little, Darren Edge, Daniel S Weld, and James A Landay. 2013. Cascade: Crowdsourcing taxonomy creation. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'13). ACM, New York, NY, USA, 1999--2008. https://doi.org/10.1145/2470654.2466265Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Minsuk Choi, Cheonbok Park, Soyoung Yang, Yonggyu Kim, Jaegul Choo, and Sungsoo Ray Hong. 2019. AILA: Attentive Interactive Labeling Assistant for Document Classification through Attention-Based Deep Neural Networks. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'19). ACM, New York, NY, USA, Paper No. 230. https://doi.org/10.1145/3290605.3300460Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Andrew M. Colman, Claire E. Norris, and Carolyn C. Preston. 1997. Comparing Rating Scales of Different Lengths: Equivalence of Scores from 5-Point and 7-Point Scales. Psychological Reports, Vol. 80, 2 (1997), 355--362. https://doi.org/10.2466/pr0.1997.80.2.355Google ScholarGoogle ScholarCross RefCross Ref
  11. John Dawes. 2008. Do Data Characteristics Change According to the Number of Scale Points Used? An Experiment Using 5-Point, 7-Point and 10-Point Scales. International Journal of Market Research, Vol. 50, 1 (2008), 61--104. https://doi.org/10.1177/147078530805000106Google ScholarGoogle ScholarCross RefCross Ref
  12. Adeline Delavande and Susann Rohwedder. 2008. Eliciting subjective probabilities in Internet surveys. Public Opinion Quarterly, Vol. 72, 5 (2008), 866--891. https://doi.org/10.1093/poq/nfn062Google ScholarGoogle ScholarCross RefCross Ref
  13. Don A Dillman, Jolene D Smyth, and Leah Melani Christian. 2014. Internet, phone, mail, and mixed-mode surveys: the tailored design method .John Wiley & Sons.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Shayan Doroudi, Ece Kamar, Emma Brunskill, and Eric Horvitz. 2016. Toward a Learning Science for Complex Crowdsourcing Tasks. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA, 2623--2634. https://doi.org/10.1145/2858036.2858268Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Anca Dumitrache. 2015. Crowdsourcing disagreement for collecting semantic annotation. In European Semantic Web Conference. Springer, 701--710. https://doi.org/10.1007/978--3--319--18818--8_43Google ScholarGoogle ScholarCross RefCross Ref
  16. Anca Dumitrache, Lora Aroyo, and Chris Welty. 2018a. Capturing Ambiguity in Crowdsourcing Frame Disambiguation. In Proceedings of the Sixth AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2018, Zü rich, Switzerland, July 5--8, 2018. AAAI, Palo Alto, CA, USA, 12--20.Google ScholarGoogle Scholar
  17. Anca Dumitrache, Lora Aroyo, and Chris Welty. 2018b. Crowdsourcing Ground Truth for Medical Relation Extraction. ACM Trans. Interact. Intell. Syst., Vol. 8, 2 (2018), 11:1--11:20. https://doi.org/10.1145/3152889Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Anca Dumitrache, Oana Inel, Lora Aroyo, Benjamin Timmermans, and Chris Welty. 2018c. CrowdTruth 2.0: Quality Metrics for Crowdsourcing with Disagreement. arXiv preprint arXiv:1808.06080 (2018).Google ScholarGoogle Scholar
  19. Natalie C Ebner, Michaela Riediger, and Ulman Lindenberger. 2010. FACES--A database of facial expressions in young, middle-aged, and older women and men: Development and validation. Behavior research methods, Vol. 42, 1 (2010), 351--362. https://doi.org/10.3758/BRM.42.1.351Google ScholarGoogle Scholar
  20. Bradley Efron and Robert J Tibshirani. 1994. An introduction to the bootstrap .CRC press.Google ScholarGoogle Scholar
  21. Hillary Anger Elfenbein and Nalini Ambady. 2002. On the universality and cultural specificity of emotion recognition: a meta-analysis. Psychological bulletin, Vol. 128, 2 (2002), 203.Google ScholarGoogle Scholar
  22. Nicholas Epley and Eugene M Caruso. 2008. Perspective Taking: Misstepping Into Others' Shoes. Handbook of imagination and mental simulation (2008), 295.Google ScholarGoogle Scholar
  23. Nicholas Epley, Boaz Keysar, Leaf Van Boven, and Thomas Gilovich. 2004. Perspective taking as egocentric anchoring and adjustment. Journal of personality and social psychology, Vol. 87, 3 (2004), 327. https://doi.org/10.1037/0022--3514.87.3.327Google ScholarGoogle ScholarCross RefCross Ref
  24. Bin-Bin Gao, Chao Xing, Chen-Wei Xie, Jianxin Wu, and Xin Geng. 2017. Deep Label Distribution Learning With Label Ambiguity. IEEE Transactions on Image Processing, Vol. 26, 6 (2017), 2825--2838. https://doi.org/10.1109/TIP.2017.2689998Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Xin Geng. 2016. Label Distribution Learning. IEEE Transactions on Knowledge and Data Engineering, Vol. 28, 7 (2016), 1734--1748. https://doi.org/10.1109/TKDE.2016.2545658Google ScholarGoogle ScholarCross RefCross Ref
  26. Xin Geng, Chao C. Yin, and Zhi-Hua Z. Zhou. 2013. Facial Age Estimation by Learning from Label Distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, 10 (2013), 2401--2412. https://doi.org/10.1109/TPAMI.2013.51Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Victor Girotto, Erin Walker, and Winslow Burleson. 2019. CrowdMuse: Supporting Crowd Idea Generation through User Modeling and Adaptation. In Proceedings of the Conference on Creativity and Cognition (C&C'19). ACM, New York, NY, USA, 95--106. https://doi.org/10.1145/3325480.3325497Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Daniel G Goldstein and David Rothschild. 2014. Lay understanding of probability distributions. Judgment & Decision Making, Vol. 9, 1 (2014).Google ScholarGoogle Scholar
  29. D. G. Gordon and T. D. Breaux. 2014. The role of legal expertise in interpretation of legal requirements and definitions. In 2014 IEEE 22nd International Requirements Engineering Conference (RE). IEEE, Piscataway, New Jersey, US, 273--282. https://doi.org/10.1109/RE.2014.6912269Google ScholarGoogle ScholarCross RefCross Ref
  30. Paul Grau, Babak Naderi, and Juho Kim. 2018. Personalized Motivation-supportive Messages for Increasing Participation in Crowd-civic Systems. Proceedings of the ACM on Human-Computer Interaction, Vol. 2, CSCW (2018), 60. https://doi.org/10.1145/3274329Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Thomas L. Griffiths, Charles Kemp, and Joshua B. Tenenbaum. 2008. Bayesian Models of Cognition .Cambridge University Press, 59--100. https://doi.org/10.1017/CBO9780511816772.006Google ScholarGoogle Scholar
  32. Danna Gurari and Kristen Grauman. 2017. CrowdVerge: Predicting If People Will Agree on the Answer to a Visual Question. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'17). ACM, New York, NY, USA, 3511--3522. https://doi.org/10.1145/3025453.3025781Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Danna Gurari, Yinan Zhao, Suyog Dutt Jain, Margrit Betke, and Kristen Grauman. 2019. Predicting How to Distribute Work Between Algorithms and Humans to Segment an Image Batch. International Journal of Computer Vision (2019), 1--19.Google ScholarGoogle Scholar
  34. Sungsoo Ray Hong, Minhyang Mia Suh, Nathalie Henry Riche, Jooyoung Lee, Juho Kim, and Mark Zachry. 2018. Collaborative Dynamic Queries: Supporting Distributed Small Group Decision-making. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'18). ACM, New York, NY, USA, Paper No. 66. https://doi.org/10.1145/3173574.3173640Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Sungsoo Ray Hong, Minhyang Mia Suh, Tae Soo Kim, Irian Smoke, Sangwha Sien, Janet Ng, Mark Zachry, and Juho Kim. 2019. Design for Collaborative Information-Seeking: Understanding User Challenges and Deploying Collaborative Dynamic Queries. Proceedings of the ACM on Human-Computer Interaction, Vol. 3 (2019), Article 106. https://doi.org/10.1145/3359208Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Shih-Wen Huang and Wai-Tat Fu. 2013. Enhancing Reliability Using Peer Consistency Evaluation in Human Computation. In Proceedings of the ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW'13). ACM, New York, NY, USA, 639--648. https://doi.org/10.1145/2441776.2441847Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Mohit Iyyer, Peter Enns, Jordan Boyd-Graber, and Philip Resnik. 2014. Political Ideology Detection Using Recursive Neural Networks. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, Maryland, 1113--1122. https://doi.org/10.3115/v1/P14--1105Google ScholarGoogle ScholarCross RefCross Ref
  38. Irving L Janis. 1971. Groupthink. Psychology today, Vol. 5, 6 (1971), 43--46.Google ScholarGoogle Scholar
  39. Alan Jern, Kai-Min K Chang, and Charles Kemp. 2014. Belief polarization is not always irrational. Psychological review, Vol. 121, 2 (2014), 206. https://doi.org/10.1037/a0035941Google ScholarGoogle Scholar
  40. Alan Jern, Kai min Chang, and Charles Kemp. 2009. Bayesian belief polarization. In Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta (Eds.). Curran Associates, Inc., 853--861. http://papers.nips.cc/paper/3725-bayesian-belief-polarization.pdfGoogle ScholarGoogle Scholar
  41. Xiuyi Jia, Weiwei Li, Junyu Liu, and Yu Zhang. 2018. Label Distribution Learning by Exploiting Label Correlations. In Thirty-Second AAAI Conference on Artificial Intelligence. AAAI, Palo Alto, CA, USA.Google ScholarGoogle Scholar
  42. Youxuan Jiang, Catherine Finegan-Dollak, Jonathan K. Kummerfeld, and Walter Lasecki. 2018. Effective Crowdsourcing for a New Type of Summarization Task. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 628--633. https://doi.org/httts://doi.org/10.18653/v1/N18--2099Google ScholarGoogle ScholarCross RefCross Ref
  43. Youxuan Jiang, Jonathan K. Kummerfeld, and Walter S. Lasecki. 2017. Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Vancouver, Canada, 103--109. https://doi.org/10.18653/v1/P17--2017Google ScholarGoogle Scholar
  44. Rob Johns. 2010. Likert items and scales. Survey Question Bank: Methods Fact Sheet, Vol. 1 (2010), 1--11.Google ScholarGoogle Scholar
  45. Eva Jonas, Stefan Schulz-Hardt, Dieter Frey, and Norman Thelen. 2001. Confirmation bias in sequential information search after preliminary decisions: an expansion of dissonance theoretical research on selective exposure to information. Journal of personality and social psychology, Vol. 80, 4 (2001), 557.Google ScholarGoogle ScholarCross RefCross Ref
  46. David Jurgens. 2013. Embracing Ambiguity: A Comparison of Annotation Methodologies for Crowdsourcing Word Sense Labels. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Atlanta, Georgia, USA, 556--562.Google ScholarGoogle Scholar
  47. Sanjay Kairam and Jeffrey Heer. 2016. Parting Crowds: Characterizing Divergent Interpretations in Crowdsourced Annotation Tasks. In Proceedings of the ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW'16). ACM, New York, NY, USA, 1637--1648. https://doi.org/10.1145/2818048.2820016Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Matthew Kay, Tara Kola, Jessica R. Hullman, and Sean A. Munson. 2016. When (Ish) is My Bus?: User-centered Visualizations of Uncertainty in Everyday, Mobile Predictive Systems. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA, 5092--5103. https://doi.org/10.1145/2858036.2858558Google ScholarGoogle Scholar
  49. Thomas Kelly. 2008. Disagreement, dogmatism, and belief polarization. The Journal of Philosophy, Vol. 105, 10 (2008), 611--633.Google ScholarGoogle ScholarCross RefCross Ref
  50. Hyunwoo Kim, Eun-Young Ko, Donghoon Han, Sung-Chul Lee, Simon T Perrault, Jihee Kim, and Juho Kim. 2019. Crowdsourcing Perspectives on Public Policy from Stakeholders. In Extended Abstracts of the ACM Conference on Human Factors in Computing Systems (CHI'19). ACM, New York, NY, USA, LBW1220. https://doi.org/10.1145/3290607.3312769Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing User Studies with Mechanical Turk. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'08). ACM, New York, NY, USA, 453--456. https://doi.org/10.1145/1357054.1357127Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Solomon Kullback and Richard A Leibler. 1951. On information and sufficiency. The annals of mathematical statistics, Vol. 22, 1 (1951), 79--86. http://www.jstor.org/stable/2236703Google ScholarGoogle Scholar
  53. Walter Lasecki and Jeffrey Bigham. 2012. Self-correcting crowds. In Extended Abstracts of the ACM conference on Human Factors in Computing Systems (CHI'12). ACM, New York, NY, USA, 2555--2560. https://doi.org/10.1145/2212776.2223835Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Walter S. Lasecki, Mitchell Gordon, Danai Koutra, Malte F. Jung, Steven P. Dow, and Jeffrey P. Bigham. 2014. Glance: Rapidly Coding Behavioral Video with the Crowd. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (UIST '14). ACM, New York, NY, USA, 551--562. https://doi.org/10.1145/2642918.2647367Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. John Le, Andy Edmonds, Vaughn Hester, and Lukas Biewald. 2010. Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution. In SIGIR 2010 workshop on crowdsourcing for search evaluation, Vol. 2126. ACM, New York, NY, USA.Google ScholarGoogle Scholar
  56. Sang Won Lee, Rebecca Krosnick, Sun Young Park, Brandon Keelean, Sach Vaidya, Stephanie D O'Keefe, and Walter S Lasecki. 2018. Exploring real-time collaboration in crowd-powered systems through a ui design tool. In Proceedings of the Conference on Computer-Supported Cooperative Work & Social Computing (CSCW'18). ACM, New York, NY, USA. https://doi.org/10.1145/3274373Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Christopher H Lin, Mausam Mausam, and Daniel S Weld. 2012. Dynamically Switching between Synergistic Workflows for Crowdsourcing. In Twenty-Sixth AAAI Conference on Artificial Intelligence .Google ScholarGoogle Scholar
  58. Donald G. Mackay and Thomas G. Bever. 1967. In search of ambiguity. Perception & Psychophysics, Vol. 2, 5 (01 May 1967), 193--200. https://doi.org/10.3758/BF03213049Google ScholarGoogle Scholar
  59. Winter Mason and Duncan J Watts. 2009. Financial incentives and the performance of crowds. In Proceedings of the ACM SIGKDD workshop on human computation. ACM, New York, NY, USA, 77--85. https://doi.org/10.1145/1809400.1809422Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Tyler McDonnell, Matthew Lease, Mucahid Kutlu, and Tamer Elsayed. 2016. Why is that relevant? Collecting annotator rationales for relevance judgments. In Fourth AAAI Conference on Human Computation and Crowdsourcing. AAAI, Palo Alto, CA, USA.Google ScholarGoogle Scholar
  61. Charles Kay Ogden and Ivor Armstrong Richards. 1923. The Meaning of Meaning: A Study of the Influence of Language upon Thought and of the Science of Symbolism. Vol. 29. K. Paul, Trench, Trubner & Company, Limited.Google ScholarGoogle Scholar
  62. David Oleson, Alexander Sorokin, Greg P Laughlin, Vaughn Hester, John Le, and Lukas Biewald. 2011. Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing. Human computation, Vol. 11, 11 (2011).Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Ingram Olkin and Friedrich Pukelsheim. 1982. The distance between two random vectors with given dispersion matrices. Linear Algebra Appl., Vol. 48 (1982), 257 -- 263. https://doi.org/10.1016/0024--3795(82)90112--4Google ScholarGoogle ScholarCross RefCross Ref
  64. Barbara Plank, Dirk Hovy, and Anders Søgaard. 2014. Linguistically debatable or just plain wrong?. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (volume 2: Short Papers), Vol. 2. ACL, Baltimore, MD, USA, 507--511. https://doi.org/10.3115/v1/P14--2083Google ScholarGoogle ScholarCross RefCross Ref
  65. Dravzen Prelec. 2004. A Bayesian Truth Serum for Subjective Data. Science, Vol. 306, 5695 (2004), 462--466. https://doi.org/10.1126/science.1102081Google ScholarGoogle Scholar
  66. Dravz en Prelec, H Sebastian Seung, and John McCoy. 2017. A solution to the single-question crowd wisdom problem. Nature, Vol. 541, 7638 (2017), 532.Google ScholarGoogle Scholar
  67. Mike Schaekermann, Joslin Goh, Kate Larson, and Edith Law. 2018. Resolvable vs. Irresolvable Disagreement: A Study on Worker Deliberation in Crowd Work. Proc. ACM Hum.-Comput. Interact., Vol. 2, CSCW, Article 154 (Nov. 2018), bibinfonumpages19 pages. https://doi.org/10.1145/3274423Google ScholarGoogle Scholar
  68. Shilad Sen, Margaret E. Giesel, Rebecca Gold, Benjamin Hillmann, Matt Lesicko, Samuel Naden, Jesse Russell, Zixiao (Ken) Wang, and Brent Hecht. 2015. Turkers, Scholars, "Arafat" and "Peace": Cultural Communities and Algorithmic Gold Standards. In Proceedings of the ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW'15). ACM, New York, NY, USA, 826--838. https://doi.org/10.1145/2675133.2675285Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Pao Siangliulue, Kenneth C Arnold, Krzysztof Z Gajos, and Steven P Dow. 2015. Toward collaborative ideation at scale: Leveraging ideas from others to generate more creative and diverse ideas. In Proceedings of the ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW'15). ACM, New York, NY, USA, 937--945. https://doi.org/10.1145/2675133.2675239Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Pao Siangliulue, Joel Chan, Krzysztof Z Gajos, and Steven P Dow. 2015. Providing timely examples improves the quantity and quality of generated ideas. In Proceedings of the Conference on Creativity and Cognition (C&C'15). ACM, New York, NY, USA, 83--92. https://doi.org/10.1145/2757226.2757230Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Y Ng. 2008. Cheap and fast--but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the conference on empirical methods in natural language processing. ACL, Baltimore, MD, USA, 254--263.Google ScholarGoogle ScholarCross RefCross Ref
  72. Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, Washington, USA, 1631--1642.Google ScholarGoogle Scholar
  73. Jean Y Song, Raymond Fok, Juho Kim, and Walter S Lasecki. 2019. FourEyes: Leveraging Tool Diversity as a Means to Improve Aggregate Accuracy in Crowdsourcing. ACM Transactions on Interactive Intelligent Systems (TiiS), Vol. 19, 1 (2019), Article No.3. https://doi.org/10.1145/3237188Google ScholarGoogle Scholar
  74. Jean Y Song, Raymond Fok, Alan Lundgard, Fan Yang, Juho Kim, and Walter S Lasecki. 2018. Two tools are better than one: Tool diversity as a means of improving aggregate crowd performance. In 23rd International Conference on Intelligent User Interfaces (IUI'18). ACM, New York, NY, USA, 559--570. https://doi.org/10.1145/3172944.3172948Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Jean Y Song, Stephan J Lemmer, Michael Xieyang Liu, Shiyan Yan, Juho Kim, Jason J Corso, and Walter S Lasecki. 2019 b. Popup: reconstructing 3D video using particle filtering to aggregate crowd responses. In Proceedings of the 24th International Conference on Intelligent User Interfaces (IUI'19). ACM, New York, NY, USA, 558--569. https://doi.org/10.1145/3301275.3302305Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta. 2017. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE international conference on computer vision. IEEE, Piscataway, New Jersey, US, 843--852. https://doi.org/10.1109/ICCV.2017.97Google ScholarGoogle ScholarCross RefCross Ref
  77. Jaime Teevan and Lisa Yu. 2017. Bringing the wisdom of the crowd to an individual by having the individual assume different roles. In Proceedings of the Conference on Creativity and Cognition (C&C'17). ACM, New York, NY, USA, 131--135. https://doi.org/10.1145/3059454.3059467Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Petros Venetis, Hector Garcia-Molina, Kerui Huang, and Neoklis Polyzotis. 2012. Max algorithms in crowdsourcing environments. In Proceedings of the 21st international conference on World Wide Web. ACM, New York, NY, USA, 989--998. https://doi.org/10.1145/2187836.2187969Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Edward Vul and Harold Pashler. 2008. Measuring the Crowd Within: Probabilistic Representations Within Individuals. Psychological Science, Vol. 19, 7 (2008), 645--647. https://doi.org/10.1111/j.1467--9280.2008.02136.xGoogle ScholarGoogle ScholarCross RefCross Ref
  80. William Yang Wang, Dan Bohus, Ece Kamar, and Eric Horvitz. 2012. Crowdsourcing the acquisition of natural language corpora: Methods and observations. In 2012 IEEE Spoken Language Technology Workshop (SLT). IEEE, Piscataway, New Jersey, US, 73--78. https://doi.org/10.1109/SLT.2012.6424200Google ScholarGoogle ScholarCross RefCross Ref
  81. Chun-Ju Yang, Kristen Grauman, and Danna Gurari. 2018. Visual Question Answer Diversity. In Sixth AAAI Conference on Human Computation and Crowdsourcing. AAAI, Palo Alto, CA, USA.Google ScholarGoogle Scholar
  82. Biqiao Zhang, Georg Essl, and Emily Mower Provost. 2017. Predicting the Distribution of Emotion Perception: Capturing Inter-rater Variability. In Proceedings of the 19th ACM International Conference on Multimodal Interaction (ICMI '17). ACM, New York, NY, USA, 51--59. https://doi.org/10.1145/3136755.3136792Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient Elicitation Approaches to Estimate Collective Crowd Answers

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the ACM on Human-Computer Interaction
      Proceedings of the ACM on Human-Computer Interaction  Volume 3, Issue CSCW
      November 2019
      5026 pages
      EISSN:2573-0142
      DOI:10.1145/3371885
      Issue’s Table of Contents

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 November 2019
      Published in pacmhci Volume 3, Issue CSCW

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader