ABSTRACT
Human linguistic annotation is crucial for many natural language processing tasks but can be expensive and time-consuming. We explore the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web. We investigate five tasks: affect recognition, word similarity, recognizing textual entailment, event temporal ordering, and word sense disambiguation. For all five, we show high agreement between Mechanical Turk non-expert annotations and existing gold standard labels provided by expert labelers. For the task of affect recognition, we also show that using non-expert labels for training machine learning algorithms can be as effective as using gold standard annotations from experts. We propose a technique for bias correction that significantly improves annotation quality on two tasks. We conclude that many large labeling tasks can be effectively designed and carried out in this method at a fraction of the usual expense.
- Paul S. Albert and Lori E. Dodd. 2004. A Cautionary Note on the Robustness of Latent Class Models for Estimating Diagnostic Error without a Gold Standard. Biometrics, Vol. 60 (2004), pp. 427--435.Google ScholarCross Ref
- Collin F. Baker, Charles J. Fillmore and John B. Lowe. 1998. The Berkeley FrameNet project. In Proc. of COLING-ACL 1998. Google ScholarDigital Library
- Michele Banko and Eric Brill. 2001. Scaling to Very Very Large Corpora for Natural Language Disambiguation. In Proc. of ACL-2001. Google ScholarDigital Library
- Junfu Cai, Wee Sun Lee and Yee Whye Teh. 2007. Improving Word Sense Disambiguation Using Topic Features. In Proc. of EMNLP-2007.Google Scholar
- Timothy Chklovski and Rada Mihalcea. 2002. Building a sense tagged corpus with Open Mind Word Expert. In Proc. of the Workshop on "Word Sense Disambiguation: Recent Successes and Future Directions", ACL 2002. Google ScholarDigital Library
- Timothy Chklovski and Yolanda Gil. 2005. Towards Managing Knowledge Collection from Volunteer Contributors. Proceedings of AAAI Spring Symposium on Knowledge Collection from Volunteer Contributors (KCVC05).Google Scholar
- Ido Dagan, Oren Glickman and Bernardo Magnini. 2006. The PASCAL Recognising Textual Entailment Challenge. Machine Learning Challenges. Lecture Notes in Computer Science, Vol. 3944, pp. 177--190, Springer, 2006. Google ScholarDigital Library
- Wisam Dakka and Panagiotis G. Ipeirotis. 2008. Automatic Extraction of Useful Facet Terms from Text Documents. In Proc. of ICDE-2008. Google ScholarDigital Library
- A. P. Dawid and A. M. Skene. 1979. Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm. Applied Statistics, Vol. 28, No. 1 (1979), pp. 20--28.Google ScholarCross Ref
- Michael Kaisser and John B. Lowe. 2008. A Research Collection of QuestionAnswer Sentence Pairs. In Proc. of LREC-2008.Google Scholar
- Michael Kaisser, Marti Hearst, and John B. Lowe. 2008. Evidence for Varying Search Results Summary Lengths. In Proc. of ACL-2008.Google Scholar
- Phil Katz, Matthew Singleton, Richard Wicentowski. 2007. SWAT-MP: The SemEval-2007 Systems for Task 5 and Task 14. In Proc. of SemEval-2007. Google ScholarDigital Library
- Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing user studies with Mechanical Turk. In Proc. of CHI-2008. Google ScholarDigital Library
- Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19:2, June 1993. Google ScholarDigital Library
- George A. Miller and William G. Charles. 1991. Contextual Correlates of Semantic Similarity. Language and Cognitive Processes, vol. 6, no. 1, pp. 1--28, 1991.Google ScholarCross Ref
- George A. Miller, Claudia Leacock, Randee Tengi, and Ross T. Bunke. 1993. A semantic concordance. In Proc. of HLT-1993. Google ScholarDigital Library
- Preslav Nakov. 2008. Paraphrasing Verbs for Noun Compound Interpretation. In Proc. of the Workshop on Multiword Expressions, LREC-2008.Google Scholar
- Martha Palmer, Dan Gildea, and Paul Kingsbury. 2005. The Proposition Bank: A Corpus Annotated with Semantic Roles. Computational Linguistics, 31:1. Google ScholarDigital Library
- Sameer Pradhan, Edward Loper, Dmitriy Dligach and Martha Palmer. 2007. SemEval-2007 Task-17: English Lexical Sample, SRL and All Words. In Proc. of SemEval-2007. Google ScholarDigital Library
- James Pustejovsky, Patrick Hanks, Roser Saur, Andrew See, Robert Gaizauskas, Andrea Setzer, Dragomir Radev, Beth Sundheim, David Day, Lisa Ferro and Marcia Lazo. 2003. The TIMEBANK Corpus. In Proc. of Corpus Linguistics 2003, 647--656.Google Scholar
- Philip Resnik. 1999. Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. JAIR, Volume 11, pages 95--130.Google ScholarCross Ref
- Herbert Rubenstein and John B. Goodenough. 1965. Contextual Correlates of Synonymy. Communications of the ACM, 8(10):627--633. Google ScholarDigital Library
- Victor S. Sheng, Foster Provost, and Panagiotis G. Ipeirotis. 2008. Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers. In Proc. of KDD-2008. Google ScholarDigital Library
- Push Singh. 2002. The public acquisition of commonsense knowledge. In Proc. of AAAI Spring Symposium on Acquiring (and Using) Linguistic (and World) Knowledge for Information Access, 2002.Google Scholar
- Alexander Sorokin and David Forsyth. 2008. Utility data annotation with Amazon Mechanical Turk. To appear in Proc. of First IEEE Workshop on Internet Vision at CVPR, 2008. See also: http://vision.cs.uiuc.edu/annotation/Google Scholar
- David G. Stork. 1999. The Open Mind Initiative. IEEE Expert Systems and Their Applications pp. 16--20, May/June 1999.Google Scholar
- Carlo Strapparava and Rada Mihalcea. 2007. SemEval-2007 Task 14: Affective Text In Proc. of SemEval-2007. Google ScholarDigital Library
- Qi Su, Dmitry Pavlov, Jyh-Herng Chow, and Wendell C. Baker. 2007. Internet-Scale Collection of Human-Reviewed Data. In Proc. of WWW-2007. Google ScholarDigital Library
- Luis von Ahn and Laura Dabbish. 2004. Labeling Images with a Computer Game. In ACM Conference on Human Factors in Computing Systems, CHI 2004. Google ScholarDigital Library
- Luis von Ahn, Mihir Kedia and Manuel Blum. 2006. Verbosity: A Game for Collecting Common-Sense Knowledge. In ACM Conference on Human Factors in Computing Systems, CHI Notes 2006. Google ScholarDigital Library
- Ellen Voorhees and Hoa Trang Dang. 2006. Overview of the TREC 2005 question answering track. In Proc. of TREC-2005.Google Scholar
- Janyce M. Wiebe, Rebecca F. Bruce and Thomas P. O'Hara. 1999. Development and use of a gold-standard data set for subjectivity classifications. In Proc. of ACL-1999. Google ScholarDigital Library
- Annie Zaenen. Submitted. Do give a penny for their thoughts. International Journal of Natural Language Engineering (submitted).Google Scholar
Recommendations
Good neighbors make good senses: exploiting distributional similarity for unsupervised WSD
COLING '08: Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1We present an automatic method for senselabeling of text in an unsupervised manner. The method makes use of distributionally similar words to derive an automatically labeled training set, which is then used to train a standard supervised classifier for ...
Cheap, Fast, and Good Enough for the Non-biomedical Domain but is It Usable for Clinical Natural Language Processing? Evaluating Crowdsourcing for Clinical Trial Announcement Named Entity Annotations
HISB '12: Proceedings of the 2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems BiologyBuilding upon previous work from the general crowdsourcing research, this study investigates the usability of crowdsourcing in the clinical NLP domain for annotating medical named entities and entity linkages in a clinical trial announcement (CTA) ...
A cheap and fast way to build useful translation lexicons
COLING '02: Proceedings of the 19th international conference on Computational linguistics - Volume 1The paper presents a statistical approach to automatic building of translation lexicons from parallel corpora. We briefly describe the pre-processing steps, a baseline iterative method, and the actual algorithm. The evaluation for the two algorithms is ...
Comments