ABSTRACT
We present an annotation scheme for the annotation of complex predicates, understood as constructions with more than one lexical unit, each contributing part of the information normally associated with a single predicate. We discuss our annotation guidelines of four types of complex predicates, and the treatment of several difficult cases, related to ambiguity, overlap and coordination. We then discuss the process of marking up the Portuguese CINTIL corpus of 1M tokens (written and spoken) with a new layer of information regarding complex predicates. We also present the outcomes of the annotation work and statistics on the types of CPs that we found in the corpus.
- }}A. Abeillé, D. Godard, and I. Sag, 1998. Complex Predicates in Nonderivational Syntax, volume 30 of Syntax and Semantics, chapter Two Kinds of Composition in French Complex predicates. San Diego Academic Press, San Diego.Google Scholar
- }}M. F. P. Bacelar do Nascimento, P. Marrafa, L. A. S. Pereira, R. Ribeiro, R. Veloso, and L. Wittmann. 1998. Le-parole - do corpus à modelização da informação lexical num sistema-multifunção. In Actas do XIII Encontro da Associação Portuguesa de Linguística, APL, pages 115--134, Lisboa.Google Scholar
- }}M. F. Bacelar do Nascimento, J. Bettencourt Gonçalves, R. Veloso, S. Antunes, F. Barreto, and R. Amaro, 2005. C-ORAL-ROM: Integrated Reference Corpora for Spoken Romance Languages, chapter The Portuguese Corpus, pages 163--207. Amsterdam/Philadelphia: John Benjamins Publishing Company, Studies in Corpus Linguistics. Editors: E. Cresti and M. Monegnia.Google Scholar
- }}M. F. Bacelar do Nascimento, 2000. Corpus, Méthodologie et Applications Linguistiques, chapter Corpus de Référence du Portugais Contemporain, pages 25--30. H. Champion et Presses Universitaires de Perpignan, Paris. Editor: M. Bilger.Google Scholar
- }}F. Barreto, A. Branco, E. Ferreira, A. Mendes, M. F. P. Bacelar do Nascimento, F. Nunes, and J. Silva. 2006. Open resources and tools for the shallow processing of portuguese. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC2006), Genoa, Italy.Google Scholar
- }}C. Bowern. 2006. Inter theorical approaches to complex verb constructions: position paper. In The Eleventh Biennal Rice University Linguistics Symposium.Google Scholar
- }}E. Carrilho and C. Magro, 2009. Syntactic Annotation System Manual of corpus CORDIAL-SIN. http://www.clul.ul.pt/sectores/variacao/cordialsin/Syntactic%20annotation%20manual.html.Google Scholar
- }}S. Cinková and V. Kolá&rcirc;ová. 2005. Nouns as components of support verb constructions in the prague dependency treebank. In Insight into Slovak and Czech Corpus Linguistics. Veda Bratislava.Google Scholar
- }}J. Cohen. 1960. A coefficient of agreement for nominal scales. Education and Psychological Measuremen, 20:37--46.Google ScholarCross Ref
- }}K. Erk, A. Kowalski, S. Padó, and M. Pinkal. 2003. Towards a resource for lexical semantics: A large german corpus with extensive semantic annotation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 537--544, Sapporo, Japan, July. Association for Computational Linguistics. Google ScholarDigital Library
- }}C. Fellbaum, A. Geyken, A. Herold, F. Koerner, and G. Neumann. 2006. Corpus-based studies of german idioms and light verbs. International Journal of Lexicography, 19(4):349--360.Google ScholarCross Ref
- }}A. Gonçalves. 2002. The causee in the faire-inf construction of portuguese. Journal of Portuguese Linguistics.Google Scholar
- }}A. Gonçalves. 2003. Defectividade funcional e predicados complexos em estruturas de controlo do português. In I. Castro and I. Duarte, editors, Miscelnea de estudos em homenagem a Maria Helena Mira Mateus, volume I. Imprensa Nacional-Casa da Moeda.Google Scholar
- }}J. Grimshaw. 1988. Light verbs and marking. Linguistic Inquiry, 19(2):205--232.Google Scholar
- }}M. Gross. 1981. Les bases empiriques de la notion de prédicat sémantique. Langages, 63:7--52.Google ScholarCross Ref
- }}O. Jespersen. 1949. A Modern English Grammar on Historical Principles. Londres: George Allen & Unwin; Copenhaga: Ejnar Munksgaard.Google Scholar
- }}R. Johansson and P. Nugues. 2006. Automatic annotation for all semantic layers in FrameNet. In Proceedings of EACL-2006, Trento, Italy, April 15--16. Google ScholarDigital Library
- }}C. R. Johnson and C. J. Fillmore. 2000. The framenet tagset for frame-semantic and syntactic coding of predicate-argument structure. In Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics (ANLP-NAACL 2000), pages 56--62, Seattle WA. Google ScholarDigital Library
- }}R. Kayne. 1975. French Syntax: the Transformational Cycle. The MIT Press, Cambridge, Mass.Google Scholar
- }}M. Marcus, S. Santorini, and M. Marcinkiewicz. 1993. Building a Large Annotated Corpus of English: the Penn Treebank. Computational Linguistics, 19(2):313--330. Google ScholarDigital Library
- }}M. Butt. 1995. The Structure of Complex Predicates in Urdu. Stanford, CA: CSLI Publications.Google Scholar
- }}A. Meyers. 2007. Annotation guidelines for nombank -- noun argument structure for propbank. Technical report, New York University. http://nlp.cs.nyu.edu/meyers/nombank/nombank-specs-2007.pdf.Google Scholar
- }}M. Mikulová, A. Bémová, J. Hajič, E. Hajicková, and J. Havelka et al. 2006. Annotation on the tectogrammatical level in the prague dependency treebank annotation manual. technical report. Technical Report UFAL CKL Technical Report TR-2006-35, ÚFAL MFF UK, Prague, Czech Rep.Google Scholar
- }}N. Xue. 2006. Annotating the predicate-argument structure of chinese nominalizations. In Proceedings of the LREC 2006, pages 1382--1387, Genoa, Italy.Google Scholar
Recommendations
Morphological annotation of the Lithuanian corpus
ACL '07: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling TechnologiesAs the development of information technologies makes progress, large morphologically annotated corpora become a necessity, as they are necessary for moving onto higher levels of language computerisation (e. g. automatic syntactic and semantic analysis, ...
Mining complex predicates in Hindi using a parallel Hindi-English corpus
MWE '09: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and ApplicationsComplex predicate is a noun, a verb, an adjective or an adverb followed by a light verb that behaves as a single unit of verb. Complex predicates (CPs) are abundantly used in Hindi and other languages of Indo Aryan family. Detecting and interpreting CPs ...
Corpus annotation with paraphrase types: new annotation scheme and inter-annotator agreement measures
Paraphrase corpora annotated with the types of paraphrases they contain constitute an essential resource for the understanding of the phenomenon of paraphrasing and the improvement of paraphrase-related systems in natural language processing. In this ...
Comments