skip to main content
Bean soup translation: flexible, linguistically-motivated syntax for machine translation
Publisher:
  • Ohio State University
  • Computer and Information Science Dept. 2036 Neil Avenue Columbus, OH
  • United States
ISBN:978-1-267-68348-9
Order Number:AAI3530214
Pages:
172
Bibliometrics
Skip Abstract Section
Abstract

Machine translation (MT) systems attempt to translate texts from one language into another by translating words from a source language and rearranging them into fluent utterances in a target language. When the two languages organize concepts in very different ways, knowledge of their general sentence structure, or syntax, is crucial. The syntax of the target language is particularly useful, because it provides a means of testing whether the reorderings that a system might try are grammatically licensed. This thesis presents two novel syntactic techniques that aid in producing correct and grammatical translations. The first technique controls target language reordering using syntactic categories that span multiple words. The second technique complements the first by assessing the well-formedness of sequences formed by these reorderings using the same syntactic categories. These innovations are implemented in the context of statistical phrase-based machine translation [Zens et al., 2002; Koehn et al., 2003], which is the prevailing modern translation paradigm.The main contribution of this thesis is to use the flexible syntax of Combinatory Categorial Grammar [CCG, Steedman, 2000] as the basis for deriving syntactic constituent labels for target strings in phrase-based systems, providing CCG labels for many target strings that traditional syntactic theories struggle to describe. These CCG labels are used to train novel syntax-based reordering and language models, which efficiently describe translation reordering patterns, as well as assess the grammaticality of target translations. The models are easily incorporated into phrase-based systems with minimal disruption to existing technology and achieve superior automatic metric scores and human evaluation ratings over a strong phrase-based baseline, as well as over syntax-based techniques that do not use CCG.

Contributors
  • The Ohio State University
  • BBN Technologies

Recommendations