ABSTRACT
We present a new English→Czech machine translation system combining linguistically motivated layers of language description (as defined in the Prague Dependency Treebank annotation scenario) with statistical NLP approaches.
- Ondřej Bojar and Zdeněk Žabokrtský. 2006. CzEng: Czech-English Parallel Corpus, Release version 0.5. Prague Bulletin of Mathematical Linguistics, 86:59--62.Google Scholar
- Thorsten Brants. 2000. TnT - A Statistical Part-of-Speech Tagger. pages 224--231, Seattle.Google Scholar
- Michael Collins. 1999. Head-driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia. Google ScholarDigital Library
- Jan Cuřín et al. 2004. Prague Czech - English Dependency Treebank, Version 1.0. CD-ROM, Linguistics Data Consortium, LDC Catalog No.: LDC2004T25, Philadelphia.Google Scholar
- Jan Hajič et al. 2006. Prague Dependency Treebank 2.0. CD-ROM, Linguistic Data Consortium, LDC Catalog No.: LDC2006T01, Philadelphia.Google Scholar
- Jan Hajič. 2004. Disambiguation of Rich Inflection -- Computational Morphology of Czech. Charles University -- The Karolinum Press, Prague.Google Scholar
- Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1994. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313--330. Google ScholarDigital Library
- Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan Hajič. 2005. Non-Projective Dependency Parsing using Spanning Tree Algorithms. In Proceedings of HTL/EMNLP, pages 523--530, Vancouver, Canada. Google ScholarDigital Library
- Arul Menezes and Stephen D. Richardson. 2001. A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. In Proceedings of the workshop on Data-driven methods in machine translation, volume 14, pages 1--8. Google ScholarDigital Library
- Guido Minnen, John Carroll, and Darren Pearce. 2000. Robust Applied Morphological Generation. In Proceedings of the 1st International Natural Language Generation Conference, pages 201--208, Israel. Google ScholarDigital Library
- Franz Josef Och and Hermann Ney. 2003. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics, 29(1):19--51. Google ScholarDigital Library
- Petr Sgall. 1967. Generativní popis jazyka a česká deklinace. Academia, Prague.Google Scholar
Index Terms
- TectoMT: highly modular MT system with tectogrammatics used as transfer layer
Recommendations
TectoMT: modular NLP framework
IceTAL'10: Proceedings of the 7th international conference on Advances in natural language processingIn the present paper we describe TectoMT, a multi-purpose open-source NLP framework. It allows for fast and efficient development of NLP applications by exploiting a wide range of software modules already integrated in TectoMT, such as tools for ...
Using TectoMT as a preprocessing tool for phrase-based statistical machine translation
TSD'10: Proceedings of the 13th international conference on Text, speech and dialogueWe present a systematic comparison of preprocessing techniques for two language pairs: English-Czech and English-Hindi. The two target languages, although both belonging to the Indo-European language family, show significant differences in morphology, ...
Comments