ABSTRACT
Today, the named entity recognition task is considered as fundamental, but it involves some specific difficulties in terms of annotation. Those issues led us to ask the fundamental question of what the annotators should annotate and, even more important, for which purpose. We thus identify the applications using named entity recognition and, according to the real needs of those applications, we propose to semantically define the elements to annotate. Finally, we put forward a number of methodological recommendations to ensure a coherent and reliable annotation scheme.
- Maud Ehrmann. 2008. Les entités nommées, de la linguistique au TAL: statut théorique et méthodes de désambiguïsation. Ph.D. thesis, Univ. Paris 7.Google Scholar
- Ulrike Gut and Petra Saskia Bayerl. 2004. Measuring the reliability of manual annotations of speech corpora. In Proc. of Speech Prosody, pages 565--568, Nara, Japan.Google Scholar
- Lynette Hirschman, Alexander Yeh, Christian Blaschke, and Alfonso Valencia. 2005. Overview of biocreative: critical assessment of information extraction for biology. BMC Bioinformatics, 6(1).Google Scholar
- J.-D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii. 2003. Genia corpus-a semantically annotated corpus for biotextmining. Bioinformatics, 19:180--182.Google ScholarCross Ref
- Jin-Dong Kim, Tomoko Ohta, Yoshimasa Tsuruoka, Yuka Tateisi, and Nigel Collier. 2004. Introduction to the bio-entity recognition task at JNLPBA. In Proc. of JNLPBA COLING 2004 Workshop, pages 70--75. Google ScholarDigital Library
- Seth Kulick, Ann Bies, Mark Liberman, Mark Mandel, Ryan McDonald, Martha Palmer, Andrew Schein, and Lyle Ungar. 2004. Integrated annotation for biomedical information extraction. In HLT-NAACL 2004 Workshop: Biolink. ACL.Google Scholar
- LDC. 2004. ACE (Automatic Content Extraction) english annotation guidelines for entities. Livrable version 5.6.1 2005.05.23, Linguistic Data Consortium.Google Scholar
- David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Linguisticae Investigaciones, 30(1):3--26.Google ScholarCross Ref
- B. Sundheim. 1995. Overview of results of the MUC-6 evaluation. In Proc. of the 6th Message Understanding Conference. Morgan Kaufmann Publishers. Google ScholarDigital Library
- Lorraine Tanabe, Natalie Xie, Lynne Thom, Wayne Matten, and John Wilbur1. 2005. Genetag: a tagged corpus for gene/protein named entity recognition. Bioinformatics, 6.Google Scholar
Index Terms
- Towards a methodology for named entities annotation
Recommendations
Annotation of chemical named entities
BioNLP '07: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language ProcessingWe describe the annotation of chemical named entities in scientific text. A set of annotation guidelines defines 5 types of named entities, and provides instructions for the resolution of special cases. A corpus of fulltext chemistry papers was ...
Accelerating the annotation of sparse named entities by dynamic sentence selection
BioNLP '08: Proceedings of the Workshop on Current Trends in Biomedical Natural Language ProcessingThis paper presents an active learning-like framework for reducing the human effort for making named entity annotations in a corpus. In this framework, the annotation work is performed as an iterative and interactive process between the human annotator ...
Automatic semantic web annotation of named entities
Canadian AI'11: Proceedings of the 24th Canadian conference on Advances in artificial intelligenceThis paper describes a method to perform automated semantic annotation of named entities contained in large corpora. The semantic annotation is made in the context of the Semantic Web. The method is based on an algorithm that compares the set of words ...
Comments