ABSTRACT
We present a rule--based shallow--parser compiler, which allows to generate a robust shallow-parser for any language, even in the absence of training data, by resorting to a very limited number of rules which aim at identifying constituent boundaries. We contrast our approach to other approaches used for shallow--parsing (i.e. finite-state and probabilistic methods). We present an evaluation of our tool for English (Penn Treebank) and for French (newspaper corpus "LeMonde") for several tasks (NP-chunking & "deeper" parsing).
- Abeillé A., Clément L. 1999: A tagged reference corpus for French. Proc. LINC-EACL'99. BergenGoogle Scholar
- Abeillé A., Clément L., Kinyon A., Toussenel F. 2001 Building a Treebank for French. In Treebanks (A Abeillé ed.). Kluwer academic publishers.Google Scholar
- Abney S. 1991. Parsing by chunks. In Principle--based Parsing. (R. Berwick, S. Abney and C. Tenny eds), Kluwer academic publishers.Google Scholar
- Aït--Mokhtar S. & Chanod J. P. 1997. Incremental Finite--State Parsing. Proc. ANLP'97, Washington, Google ScholarDigital Library
- Bourigault 1992: Surface Grammatical analysis for the extraction of terminological noun phrases. Proc. COLING'92. Vol 3, pp. 977--981 Google ScholarDigital Library
- Brants T., Skut W., Uszkoreit H., 1999. Syntactic Annotation of a German Newspaper Corpus. Proc. ATALA Treebank Workshop. Paris, France.Google Scholar
- Daelemans W., Buchholz S., Veenstra J. Memory--Based Shallow Parsing. Proc. CoNLL--EACL'99Google Scholar
- Grefenstette G. 1996. Light Parsing as Finite--State Filtering. Proc. ECAI '96 workshop on "Extended finite state models of language". Google ScholarDigital Library
- Joshi A. K. Hopely P. 1997. A parser from antiquity. In Extended Finite State Models of Language. (A. Kornai ed.). University Press.Google Scholar
- Karlsson F., Voutilainen A., Heikkil J., Antilla A. (eds.) 1995. Constraint Grammar: a language--independent system for parsing unrestricted text. Mouton de Gruyer. Google ScholarDigital Library
- Kinyon A. 2000. Hypertags. Proc. COLING'00. Sarrebrucken. Google ScholarDigital Library
- Magerman D. M., 1994 Natural language parsing as statistical pattern recognition. PhD Dissertation, Stanford University. Google ScholarDigital Library
- Marcus M., Santorini B., and Marcinkiewicz M. A. 1993. Building a large annotated corpus of english: The penn treebank. Computational Linguistics, 19:313--330. Google ScholarDigital Library
- Ramshaw, L. A. & Marcus, M. P., 1995. Text Chunking using Transformation--Based Learning, ACL Third Workshop on Very Large Corpora, pp.82--94, 1995.Google Scholar
- Ratnaparkhi A. 1997. linear observed time statistical parser based on maximum entropy models. Technical Report cmp-lg/9706014.Google Scholar
- Tapanainen P. and Järvinen T., 1994, Syntactic Analysis of a Natural Language Using Linguistic Rules and Corpus--Based Patterns. Proc. COLING'94. Vol 1, pp 629--634. Kyoto. Google ScholarDigital Library
- Schmid H. 1994 Probabilistic Part--Of--Speech Tagging Using Decision Trees. Proc. NEMLAP'94.Google Scholar
- Vergne J. 1999. Etude et modélisation de la syntaxe des langues à l'aide de l'ordinateur. Analyse syntaxique automatique non combinatoire. Dossier d'habilitation à diriger des recherches. Univ. de Caen.Google Scholar
- A language--independent shallow--parser compiler
Recommendations
UCSG shallow parser
CICLing'06: Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text ProcessingRecently, there is an increasing interest in integrating rule based methods with statistical techniques for developing robust, wide coverage, high performance parsing systems. In this paper, we describe an architecture, called UCSG shallow parser ...
Preordering using a Target-Language Parser via Cross-Language Syntactic Projection for Statistical Machine Translation
When translating between languages with widely different word orders, word reordering can present a major challenge. Although some word reordering methods do not employ source-language syntactic structures, such structures are inherently useful for word ...
Bottom-up context-sensitive algorithms for Bengali parser in natural language processing
This paper embodies the design of parsing algorithms tangibly for a Bengali parser. To design parsing algorithms a detailed study on linguistics and grammar has been performed. A detailed study also has been made on the various techniques and algorithms ...
Comments