ABSTRACT
We describe the OntoNotes methodology and its result, a large multilingual richly-annotated corpus constructed at 90% interannotator agreement. An initial portion (300K words of English newswire and 250K words of Chinese newswire) will be made available to the community during 2007.
- O. Babko-Malaya, M. Palmer, N. Xue, A. Joshi, and S. Kulick. 2004. Proposition Bank II: Delving Deeper, Frontiers in Corpus Annotation, Workshop, HLT/NAACLGoogle Scholar
- C. F. Baker, C. J. Fillmore, and J. B. Lowe. 1998. The Berkeley FrameNet Project. In Proceedings of COLING/ACL, pages 86--90. Google ScholarDigital Library
- J. Chen and M. Palmer. 2005. Towards Robust High Performance Word Sense Disambiguation of English Verbs Using Rich Linguistic Features. In Proceedings of IJCNLP2005, pp. 933--944. Google ScholarDigital Library
- B. Dorr and N. Habash. 2001. Lexical Conceptual Structure Lexicons. In Calzolari et al. ISLE-IST-1999-10647-WP2-WP3, Survey of Major Approaches Towards Bilingual/Multilingual Lexicons.Google Scholar
- A. Burchardt, K. Erk, A. Frank, A. Kowalski, S. Pado, and M. Pinkal. 2006. Consistency and Coverage: Challenges for exhaustive semantic annotation. In Proceedings of DGfS-06.Google Scholar
- C. Fellbaum (ed.). 1998. WordNet: An On-line Lexical Database and Some of its Applications. MIT Press.Google Scholar
- R. Gabbard, M. Marcus, and S. Kulick. Fully Parsing the Penn Treebank. In Proceedings of HLT/NAACL 2006. Google ScholarDigital Library
- A. Gangemi, N. Guarino, C. Masolo, A. Oltramari, and L. Schneider. 2002. Sweetening Ontologies with DOLCE. In Proceedings of EKAW pp. 166--181. Google ScholarDigital Library
- J. Hajic, B. Vidová-Hladká, and P. Pajas. 2001: The Prague Dependency Treebank: Annotation Structure and Support. Proceeding of the IRCS Workshop on Linguistic Databases, pp. 105--114.Google Scholar
- M. Marcus, B. Santorini, and M. A. Marcinkiewicz. 1993. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19:313--330. Google ScholarDigital Library
- A. Meyers, R. Reeves, C Macleod, R. Szekely, V. Zielinska, B. Young, and R. Grishman. 2004. The NomBank Project: An Interim Report. Frontiers in Corpus Annotation, Workshop in conjunction with HLT/NAACL.Google Scholar
- I. Niles and A. Pease. 2001. Towards a Standard Upper Ontology. Proceedings of the International Conference on Formal Ontology in Information Systems (FOIS-2001). Google ScholarDigital Library
- M. Palmer, O. Babko-Malaya, and H. T. Dang. 2004. Different Sense Granularities for Different Applications, 2nd Workshop on Scalable Natural Language Understanding Systems, at HLT/NAACL-04.Google Scholar
- M. Palmer, H. Dang and C. Fellbaum. 2006. Making Finegrained and Coarse-grained Sense Distinctions, Both Manually and Automatically, Journal of Natural Language Engineering, to appear.Google Scholar
- M. Palmer, D. Gildea, and P. Kingsbury. 2005. The Proposition Bank: A Corpus Annotated with Semantic Roles, Computational Linguistics, 31(1). Google ScholarDigital Library
- A. Philpot, E. Hovy, and P. Pantel. 2005. The Omega Ontology. Proceedings of the ONTOLEX Workshop at IJCNLPGoogle Scholar
- S. Pradhan, W. Ward, K. Hacioglu, J. Martin, D. Jurafsky. 2005. Semantic Role Labeling Using Different Syntactic Views. Proceedings of the ACL. Google ScholarDigital Library
- F. Reeder, B. Dorr, D. Farwell, N. Habash, S. Helmreich, E. H. Hovy, L. Levin, T. Mitamura, K. Miller, O. Rambow, A. Siddharthan. 2004. Interlingual Annotation for MT Development. Proceedings of AMTA.Google ScholarCross Ref
- OntoNotes: the 90% solution
Recommendations
OntoNotes: corpus cleanup of mistaken agreement using word sense disambiguation
COLING '08: Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1Annotated corpora are only useful if their annotations are consistent. Most large-scale annotation efforts take special measures to reconcile inter-annotator disagreement. To date, however, no-one has investigated how to automatically determine ...
Annotation and verification of sense pools in OntoNotes
The paper describes the OntoNotes, a multilingual (English, Chinese and Arabic) corpus with large-scale semantic annotations, including predicate-argument structure, word senses, ontology linking, and coreference. The underlying semantic model of ...
Word sense disambiguation using OntoNotes: an empirical study
EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language ProcessingThe accuracy of current word sense disambiguation (WSD) systems is affected by the fine-grained sense inventory of WordNet as well as a lack of training examples. Using the WSD examples provided through OntoNotes, we conduct the first large-scale WSD ...
Comments