Bean soup translation

Bean soup translation: flexible, linguistically-motivated syntax for machine translation

January 2012

Author:
Dennis Nolan Mehay
The Ohio State University
,
Adviser:
William Schuler
The Ohio State University

Publisher:

Ohio State University
Computer and Information Science Dept. 2036 Neil Avenue Columbus, OH
United States

ISBN:978-1-267-68348-9

Order Number:AAI3530214

Pages:

172

Purchase on ProQuest

Bibliometrics

Abstract

Machine translation (MT) systems attempt to translate texts from one language into another by translating words from a source language and rearranging them into fluent utterances in a target language. When the two languages organize concepts in very different ways, knowledge of their general sentence structure, or syntax, is crucial. The syntax of the target language is particularly useful, because it provides a means of testing whether the reorderings that a system might try are grammatically licensed. This thesis presents two novel syntactic techniques that aid in producing correct and grammatical translations. The first technique controls target language reordering using syntactic categories that span multiple words. The second technique complements the first by assessing the well-formedness of sequences formed by these reorderings using the same syntactic categories. These innovations are implemented in the context of statistical phrase-based machine translation [Zens et al., 2002; Koehn et al., 2003], which is the prevailing modern translation paradigm.The main contribution of this thesis is to use the flexible syntax of Combinatory Categorial Grammar [CCG, Steedman, 2000] as the basis for deriving syntactic constituent labels for target strings in phrase-based systems, providing CCG labels for many target strings that traditional syntactic theories struggle to describe. These CCG labels are used to train novel syntax-based reordering and language models, which efficiently describe translation reordering patterns, as well as assess the grammaticality of target translations. The models are easily incorporated into phrase-based systems with minimal disruption to existing technology and achieve superior automatic metric scores and human evaluation ratings over a strong phrase-based baseline, as well as over syntax-based techniques that do not use CCG.

Contributors

William Edward Schuler
The Ohio State University
- Publication Years1998 - 2012
- Publication counts29
- Citation count200
- Available for Download26
- Downloads (cumulative)5,263
- Downloads (12 months)438
- Downloads (6 weeks)81
- Average Downloads per Article202
- Average Citation per Article7
View Full Profile
Dennis Nolan Mehay
BBN Technologies
- Publication Years2008 - 2015
- Publication counts7
- Citation count10
- Available for Download5
- Downloads (cumulative)800
- Downloads (12 months)83
- Downloads (6 weeks)14
- Average Downloads per Article160
- Average Citation per Article1
View Full Profile

Recommendations

Dependency treelet translation: the convergence of statistical and example-based machine-translation?

We describe a novel approach to MT that combines the strengths of the two leading corpus-based approaches: Phrasal SMT and EBMT. We use a syntactically informed decoder and reordering model based on the source dependency tree, in combination with ...
Read More
N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination
EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics

In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven ...
Read More
Data-Oriented Translation
COLING '00: Proceedings of the 18th conference on Computational linguistics - Volume 2

In this article, we present a statistical approach to machine translation that is based on Data-Oriented Parsing: Data-Oriented Translation (DOT). In DOT, we use linked subtree pairs for creating a derivation of a source sentence. Each linked subtree ...
Read More

Comments

Browse Theses

Sections

Dependency treelet translation: the convergence of statistical and example-based machine-translation?

N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination

Data-Oriented Translation

Sections

Save to Binder

Recommendations

Dependency treelet translation: the convergence of statistical and example-based machine-translation?

N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination

Data-Oriented Translation