ABSTRACT
Statistical MT has made great progress in the last few years, but current translation models are weak on re-ordering and target language fluency. Syntactic approaches seek to remedy these problems. In this paper, we take the framework for acquiring multi-level syntactic translation rules of (Galley et al., 2004) from aligned tree-string pairs, and present two main extensions of their approach: first, instead of merely computing a single derivation that minimally explains a sentence pair, we construct a large number of derivations that include contextually richer rules, and account for multiple interpretations of unaligned words. Second, we propose probability estimates and a training procedure for weighting these rules. We contrast different approaches on real examples, show that our estimates based on multiple derivations favor phrasal re-orderings that are linguistically better motivated, and establish that our larger rules provide a 3.63 BLEU point increase over minimal rules.
- D. Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proc. of ACL. Google ScholarDigital Library
- H. Fox. 2002. Phrasal cohesion and statistical machine translation. In Proc. of EMNLP, pages 304--311. Google ScholarDigital Library
- M. Galley, M. Hopkins, K. Knight, and D. Marcu. 2004. What's in a translation rule? In Proc. of HLT/NAACL-04.Google Scholar
- J. Graehl and K. Knight. 2004. Training tree transducers. In Proc. of HLT/NAACL-04, pages 105--112.Google Scholar
- F. Och and H. Ney. 2004. The alignment template approach to statistical machine translation. Computational Linguistics, 30(4):417--449. Google ScholarDigital Library
- A. Poutsma. 2000. Data-oriented translation. In Proc. of COLING, pages 635--641. Google ScholarDigital Library
- D. Wu. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 23(3):377--404. Google ScholarDigital Library
- K. Yamada and K. Knight. 2001. A syntax-based statistical translation model. In Proc. of ACL, pages 523--530. Google ScholarDigital Library
- H. Zhang, L. Huang, D. Gildea, and K. Knight. 2006. Synchronous binarization for machine translation. In Proc. of HLT/NAACL. Google ScholarDigital Library
- Scalable inference and training of context-rich syntactic translation models
Recommendations
Incremental syntactic language models for phrase-based translation
HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1This paper describes a novel technique for incorporating syntactic knowledge into phrase-based machine translation through incremental syntactic parsing. Bottom-up and top-down parsers typically require a completed string as input. This requirement ...
CCG syntactic reordering models for phrase-based machine translation
WMT '12: Proceedings of the Seventh Workshop on Statistical Machine TranslationStatistical phrase-based machine translation requires no linguistic information beyond word-aligned parallel corpora (Zens et al., 2002; Koehn et al., 2003). Unfortunately, this linguistic agnosticism often produces ungrammatical translations. Syntax, ...
Comments