research-article

Free Access

Anveshan: a framework for analysis of multiple annotators' labeling behavior

Authors:
Vikas Bhardwaj

Columbia University, New York, NY

Columbia University, New York, NY
View Profile

,
Rebecca J. Passonneau

Columbia University, New York, NY

Columbia University, New York, NY
View Profile

,
Ansaf Salleb-Aouissi

Columbia University, New York, NY

Columbia University, New York, NY
View Profile

,
Nancy Ide

Vassar College, Poughkeepsie, NY

Vassar College, Poughkeepsie, NY
View Profile

Authors Info & Claims

LAW IV '10: Proceedings of the Fourth Linguistic Annotation WorkshopJuly 2010Pages 47–55

Published:15 July 2010Publication History

LAW IV '10: Proceedings of the Fourth Linguistic Annotation Workshop

Pages 47–55

ABSTRACT

Manual annotation of natural language to capture linguistic information is essential for NLP tasks involving supervised machine learning of semantic knowledge. Judgements of meaning can be more or less subjective, in which case instead of a single correct label, the labels assigned might vary among annotators based on the annotators' knowledge, age, gender, intuitions, background, and so on. We introduce a framework "Anveshan," where we investigate annotator behavior to find outliers, cluster annotators by behavior, and identify confusable labels. We also investigate the effectiveness of using trained annotators versus a larger number of untrained annotators on a word sense annotation task. The annotation data comes from a word sense disambiguation task for polysemous words, annotated by both trained annotators and untrained annotators from Amazon's Mechanical turk. Our results show that Anveshan is effective in uncovering patterns in annotator behavior, and we also show that trained annotators are superior to a larger number of untrained annotators for this task.

References

}}Cecilia Ovesdotter Alm, Dan Roth, and Richard Sproat. 2005. Emotions from text: machine learning for text-based emotion prediction. In HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 579--586, Morristown, NJ, USA. Association for Computational Linguistics. Google ScholarDigital Library
}}Ron Artstein and Massimo Poesio. 2008. Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4):555--596. Google ScholarDigital Library
}}Marco Carbone, Yaakov Gal, Stuart Shieber, and Barbara Grosz. 2004. Unifying annotated discourse hierarchies to create a gold standard. In Proceedings of the 5th Sigdial Workshop on Discourse and Dialogue.Google Scholar
}}Irina Chugur, Julio Gonzalo, and Felisa Verdejo. 2002. Polysemy and sense proximity in the senseval-2 test suite. In Proceedings of the SIGLEX/SENSEVAL Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, pages 32--39, Philadelphia. Google ScholarDigital Library
}}Jacob Cohen. 1960. A coeffiecient of agreement for nominal scales. Educational and Psychological Measurement, 20:37--46.Google ScholarCross Ref
}}Mona Diab. 2004. Relieving the data acquisition bottleneck in word sense disambiguation. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, pages 303--311. Google ScholarDigital Library
}}Jun Hu, Rebecca J. Passonneau, and Owen Rambow. 2009. Contrasting the interaction structure of an email and a telephone corpus: A machine learning approach to annotation of dialogue function units. In Proceedings of the 10th SIGDIAL on Dialogue and Discourse. Google ScholarDigital Library
}}Nancy Ide and Yorick Wilks. 2006. Making sense about sense. In E. Agirre and P. Edmonds, editors, Word Sense Disambiguation: Algorithms and Applications, pages 47--74, Dordrecht, The Netherlands. Springer.Google ScholarCross Ref
}}Nancy Ide, Tomaz Erjavec, and Dan Tufis. 2002. Sense discrimination with parallel corpora. In Proceedings of ACL'02 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, pages 54--60, Philadelphia. Google ScholarDigital Library
}}Nancy Ide, Collin Baker, Christiane Fellbaum, and Rebecca Passonneau. 2010. The manually annotated sub-corpus: A community resource for and by the people. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden. Google ScholarDigital Library
}}Adam Kilgarriff and Martha Palmer. 2000. Introduction to the special issue on senseval. Computers and the Humanities, 34:1--2.Google ScholarCross Ref
}}Adam Kilgarriff. 1997. I don't believe in word senses. Computers and the Humanities, 31:91--113.Google ScholarCross Ref
}}Adam Kilgarriff. 1998. SENSEVAL: An exercise in evaluating word sense disambiguation programs. In Proceedings of the First International Conference on Language Resources and Evaluation (LREC), pages 581--588, Granada.Google Scholar
}}Devra Klein and Gregory Murphy. 2002. Paper has been my ruin: Conceptual relations of polysemous words. Journal of Memory and Language, 47:548--70.Google ScholarCross Ref
}}Klaus Krippendorff. 1980. Content Analysis: An Introduction to Its Methodology. Sage Publications, Beverly Hills, CA.Google Scholar
}}Chuck P. Lam and David G. Stork. 2003. Evaluating classifiers by means of test data with noisy labels. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI-03), pages 513--518, Acapulco. Google ScholarDigital Library
}}George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller. 1993. Introduction to WordNet: An on-line lexical database (revised). Technical Report Cognitive Science Laboratory (CSL) Report 43, Princeton University, Princeton. Revised March 1993.Google Scholar
}}Hwee Tou Ng, Chung Yong Lim, and Shou King Foo. 1999. A case study on inter-annotator agreement for word sense disambiguation. In SIGLEX Workshop On Standardizing Lexical Resources.Google Scholar
}}Martha Palmer, Hoa Trang Dang, and Christiane Fellbaum. 2005a. Making fine-grained and coarsegrained sense distinctions. Journal of Natural Language Engineering, 13.2:137--163.Google ScholarCross Ref
}}Martha Palmer, Daniel Gildea, and Paul Kingsbury. 2005b. The proposition bank: An annotated corpus of semantic roles. Comput. Linguist., 31(1):71--106. Google ScholarDigital Library
}}Rebecca J. Passonneau, Nizar Habash, and Owen Rambow. 2006. Inter-annotator agreement on a multilingual semantic annotation task. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), pages 1951--1956, Genoa, Italy.Google Scholar
}}Rebecca Passonneau, Tom Lippincott, Tae Yano, and Judith Klavans. 2008. Relation between agreement measures on human labeling and machine learning performance: results from an art history domain. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC), pages 2841--2848.Google Scholar
}}Ted Pedersen. 2002a. Assessing system agreement and instance difficulty in the lexical sample tasks of Senseval-2. In Proceedings of the ACL-02 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, pages 40--46. Google ScholarDigital Library
}}Ted Pedersen. 2002b. Evaluating the effectiveness of ensembles of decision trees in disambiguating SEN-SEVAL lexical samples. In Proceedings of the ACL-02 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, pages 81--87. Google ScholarDigital Library
}}Dennis Reidsma and Jean Carletta. 2008. Reliability measurement without limits. Comput. Linguist., 34(3):319--326. Google ScholarDigital Library
}}Josef Ruppenhofer, Michael Ellsworth, Miriam R. L. Petruck, Christopher R. Johnson, and Jan Scheffczyk. 2006. Framenet ii: Extended theory and practice. Available from http://framenet.icsi.berkeley.edu/index.php.Google Scholar
}}Victor S. Sheng, Foster Provost, and Panagiotis G. Ipeirotis. 2008. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceeding of the 14th ACM SIG KDD International Conference on Knowledge Discovery and Data Mining, pages 614--622, Las Vegas. Google ScholarDigital Library
}}Rion Snow, Daniel Jurafsky, and Andrew Y. Ng. 2007. Learning to merge word senses. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1005--1014, Prague.Google Scholar
}}Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and fast - but is it good? evaluating non-expert annotations for natural language tasks. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP), pages 254--263, Honolulu. Google ScholarDigital Library
}}Jean Véronis. 1998. A study of polysemy judgements and inter-annotator agreement. In SENSEVAL Workshop, pages Sussex, England.Google Scholar
}}Janyce Wiebe and Claire Cardie. 2005. Annotating expressions of opinions and emotions in language. language resources and evaluation. In Language Resources and Evaluation (formerly Computers and the Humanities, page 2005.Google Scholar

Recommendations

Comparison of Methods to Annotate Named Entity Corpora

The authors compared two methods for annotating a corpus for the named entity (NE) recognition task using non-expert annotators: (i) revising the results of an existing NE recognizer and (ii) manually annotating the NEs completely. The annotation time, ...
Read More
Learning multilingual named entity recognition from Wikipedia

We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...
Read More
Transductive Multilabel Learning via Label Set Propagation

The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
LAW IV '10: Proceedings of the Fourth Linguistic Annotation Workshop
July 2010
305 pages
ISBN:9781932432725
Program Chairs:
Nianwen Xue
Brandeis University
,
Massimo Poesio
University of Trento
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 15 July 2010
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 205
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Anveshan: a framework for analysis of multiple annotators' labeling behavior

LAW IV '10: Proceedings of the Fourth Linguistic Annotation Workshop

ABSTRACT

References

Cited By

Recommendations

Comparison of Methods to Annotate Named Entity Corpora

Learning multilingual named entity recognition from Wikipedia

Transductive Multilabel Learning via Label Set Propagation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Anveshan: a framework for analysis of multiple annotators' labeling behavior

LAW IV '10: Proceedings of the Fourth Linguistic Annotation Workshop

ABSTRACT

References

Cited By

Recommendations

Comparison of Methods to Annotate Named Entity Corpora

Learning multilingual named entity recognition from Wikipedia

Transductive Multilabel Learning via Label Set Propagation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media