ABSTRACT
This paper presents our basic approach to creating Proposition Bank, which involves adding a layer of semantic annotation to the Penn English TreeBank. Without attempting to confirm or disconfirm any particular semantic theory, our goal is to provide consistent argument labeling that will facilitate the automatic extraction of relational data. An argument such as the window in John broke the window and in The window broke would receive the same label in both sentences. In order to ensure reliable human annotation, we provide our annotators with explicit guidelines for labeling all of the syntactic and semantic frames of each particular verb. We give several examples of these guidelines and discuss the inter-annotator agreement figures. We also discuss our current experiments on the automatic expansion of our verb guidelines based on verb class membership. Our current rate of progress and our consistency of annotation demonstrate the feasibility of the task.
- Eugene Charniak. Parsing with Context-Free Grammars and Word Statistics. In Technical Report: CS-95-28, Brown University, 1995. Google ScholarDigital Library
- M. Collins. Three generative, lexicalised models for statistical parsing. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, Madrid, Spain, July 1997. Google ScholarDigital Library
- Michael Collins. Discriminative reranking for natural language parsing. In International Conference on Machine Learning, 2000. Google ScholarDigital Library
- Eva Hajicova, Jarmila Panevova, Petr Sgall. Tectogrammatics in Corpus Tagging. In Perspectives on Semantics, Pragmatics, and Discourse: A Festschrift for Ferenc Keifer, I. Kenesei and R. M. Harnish eds.Google Scholar
- Karin Kipper, Hoa Trang Dang, Martha Palmer. Class-Based Construction of a Verb Lexicon. AAAI-2000, Seventeenth National Conference on Artificial Intelligence, Austin TX, July 30 -- August 3, 2000. Google ScholarDigital Library
- Beth Levin. English Verb Classes and Alternations A Preliminary Investigation. 1993.Google Scholar
- J. B. Lowe, C. F. Baker, and C. J. Fillmore. A frame-semantic approach to semantic annotation. In Proceedings 1997 Siglex Workshop/ANLP97, Washington, D.C., 1997.Google Scholar
- Mitch Marcus. The Penn TreeBank: A revised corpus design for extracting predicate-argument structure. In Proceedings of the ARPA Human Language Technology Workshop, Princeton, NJ, March 1994. Google ScholarDigital Library
- M. Marcus, B. Santorini, M. A. Marcinkiewicz. Building a large annotated corpus of English: the Penn TreeBank. Computational linguistics. Vol 19, 1993. Google ScholarDigital Library
- G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Five papers on wordnet. Technical Report 43, Cognitive Science Laboratory, Princeton University, July 1990.Google Scholar
- Scott Miller, Heidi Fox, Lance Ramshaw, and Ralph Weischedel. Sift --- statistically-derived information from text. In Seventh Message Understanding Conference (MUC-7), Washington, D.C., 1998.Google Scholar
Recommendations
Adding semantic roles to the chinese treebank
We report work on adding semantic role labels to the Chinese Treebank, a corpus already annotated with phrase structures. The work involves locating all verbs and their nominalizations in the corpus, and semi-automatically adding semantic role labels to ...
Parsing noun phrases in the penn treebank
Noun phrases (nps) are a crucial part of natural language, and can have a very complex structure. However, this np structure is largely ignored by the statistical parsing field, as the most widely used corpus is not annotated with it. This lack of gold-...
The Penn Chinese TreeBank: Phrase structure annotation of a large corpus
With growing interest in Chinese Language Processing, numerous NLP tools (e.g., word segmenters, part-of-speech taggers, and parsers) for Chinese have been developed all over the world. However, since no large-scale bracketed corpora are available to ...
Comments