research-article

Free Access

Findings of the 2011 Workshop on Statistical Machine Translation

Authors:
Chris Callison-Burch

Johns Hopkins University

Johns Hopkins University
View Profile

,
Philipp Koehn

University of Edinburgh

University of Edinburgh
View Profile

,
Christof Monz

University of Amsterdam

University of Amsterdam
View Profile

,
Omar F. Zaidan

Johns Hopkins University

Johns Hopkins University
View Profile

Authors Info & Claims

WMT '11: Proceedings of the Sixth Workshop on Statistical Machine TranslationJuly 2011Pages 22–64

Published:30 July 2011Publication History

WMT '11: Proceedings of the Sixth Workshop on Statistical Machine Translation

Pages 22–64

ABSTRACT

This paper presents the results of the WMT11 shared tasks, which included a translation task, a system combination task, and a task for machine translation evaluation metrics. We conducted a large-scale manual evaluation of 148 machine translation systems and 41 system combination entries. We used the ranking of these systems to measure how strongly automatic metrics correlate with human judgments of translation quality for 21 evaluation metrics. This year featured a Haitian Creole to English task translating SMS messages sent to an emergency response service in the aftermath of the Haitian earthquake. We also conducted a pilot 'tunable metrics' task to test whether optimizing a fixed system to different metrics would result in perceptibly different translation quality.

References

Vera Aleksic and Gregor Thurmair. 2011. Personal Translator at WMT2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Alexandre Allauzen, Hélène Bonneau-Maynard, Hai-Son Le, Aurélien Max, Guillaume Wisniewski, François Yvon, Gilles Adda, Josep Maria Crego, Adrien Lardilleux, Thomas Lavergne, and Artem Sokolov. 2011. LIMSI @ WMT11. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Yigal Attali and Jill Burstein. 2006. Automated essay scoring with e-rater v.2.0. Journal of Technology, Learning, and Assessment, 4(3):159--174.Google Scholar
Eleftherios Avramidis, Maja Popović, David Vilar, and Aljoscha Burchardt. 2011. Evaluate with confidence estimation: Machine ranking of translation outputs using grammatical features. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Wilker Aziz, Miguel Rios, and Lucia Specia. 2011. Shallow semantic trees for SMT. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Kathryn Baker, Michael Bloodgood, Chris Callison-Burch, Bonnie Dorr, Scott Miller, Christine Piatko, Nathaniel W. Filardo, and Lori Levin. 2010. Semantically-informed syntactic machine translation: A tree-grafting approach. In Proceedings of AMTA.Google Scholar
Loïc Barrault. 2011. MANY improvements for WMT'11. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Ergun Bicici and Deniz Yuret. 2011. RegMT system for machine translation, system combination, and evaluation. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Ondřej Bojar and Aleš Tamchyna. 2011. Improving translation model by monolingual data. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Chris Callison-Burch, Cameron Fordyce, Philipp Koehn, Christof Monz, and Josh Schroeder. 2007. (Meta-) evaluation of machine translation. In Proceedings of the Second Workshop on Statistical Machine Translation (WMT07), Prague, Czech Republic. Google ScholarDigital Library
Chris Callison-Burch, Cameron Fordyce, Philipp Koehn, Christof Monz, and Josh Schroeder. 2008. Further meta-evaluation of machine translation. In Proceedings of the Third Workshop on Statistical Machine Translation (WMT08), Colmbus, Ohio. Google ScholarDigital Library
Chris Callison-Burch, Philipp Koehn, Christof Monz, and Josh Schroeder. 2009. Findings of the 2009 workshop on statistical machine translation. In Proceedings of the Fourth Workshop on Statistical Machine Translation (WMT09), Athens, Greece. Google ScholarDigital Library
Chris Callison-Burch, Philipp Koehn, Christof Monz, Kay Peterson, Mark Przybocki, and Omar F. Zaidan. 2010. Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In Proceedings of the Fourth Workshop on Statistical Machine Translation (WMT10), Uppsala, Sweden. Google ScholarDigital Library
Boxing Chen and Roland Kuhn. 2011. Amber: A modified bleu, enhanced ranking metric. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurment, 20(1):37--46.Google ScholarCross Ref
Antonio M. Corbí-Bellot, Mikel L. Forcada, Sergio Ortiz-Rojas, Juan Antonio Pérez-Ortiz, Gema Ramírez-Sánchez, Felipe Sánchez-Martínez, Iñaki Alegria, Aingeru Mayor, and Kepa Sarasola. 2005. An open-source shallow-transfer machine translation engine for the romance languages of Spain. In Proceedings of the European Association for Machine Translation, pages 79--86.Google Scholar
Marta R. Costa-jussà and Rafael E. Banchs. 2011. The BM-I2R Haitian-Créole-to-English translation system description for the WMT 2011 evaluation campaign. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Daniel Dahlmeier, Chang Liu, and Hwee Tou Ng. 2011. TESLA at WMT 2011: Translation evaluation and tunable metric. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Michael Denkowski and Alon Lavie. 2011a. Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Michael Denkowski and Alon Lavie. 2011b. METEOR-Tuned Phrase-Based SMT: CMU French-English and Haitian-English Systems for WMT 2011. Technical Report CMU-LTI-11-011, Language Technologies Institute, Carnegie Mellon University.Google Scholar
Chris Dyer, Kevin Gimpel, Jonathan H. Clark, and Noah A. Smith. 2011. The CMU-ARK German-English translation system. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Vladimir Eidelman, Kristy Hollingshead, and Philip Resnik. 2011. Noisy SMS machine translation in low-density languages. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Christian Federmann and Sabine Hunsicker. 2011. Stochastic parse tree selection for an existing RBMT system. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Robert Frederking, Alexander Rudnicky, and Christopher Hogan. 1997. Interactive speech translation in the DIPLOMAT project. In Proceedings of the ACL-1997 Workshop on Spoken Language Translation.Google Scholar
Markus Freitag, Gregor Leusch, Joern Wuebker, Stephan Peitz, Hermann Ney, Teresa Herrmann, Jan Niehues, Alex Waibel, Alexandre Allauzen, Gilles Adda, Josep Maria Crego, Bianka Buschbeck, Tonio Wandmacher, and Jean Senellart. 2011. Joint WMT submission of the QUAERO project. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Yoko Futagi, Paul Deane, Martin Chodorow, and Joel Tetreault. 2008. A computational approach to detecting collocation errors in the writing of non-native speakers of English. Computer Assisted Language Learning Journal.Google Scholar
Jesús González-Rubio and Francisco Casacuberta. 2011. The UPV-PRHLT combination system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Greg Hanneman and Alon Lavie. 2011. CMU syntax-based machine translation at WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Christian Hardmeier, Jörg Tiedemann, Markus Saers, Marcello Federico, and Mathur Prashant. 2011. The Uppsala-FBK systems at WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Kenneth Heafield and Alon Lavie. 2011. CMU system combination in WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Teresa Herrmann, Mohammed Mediani, Jan Niehues, and Alex Waibel. 2011. The Karlsruhe Institute of Technology translation systems for the WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Sanjika Hewavitharana, Nguyen Bach, Qin Gao, Vamshi Ambati, and Stephan Vogel. 2011. CMU Haitian Creole-English translation system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Maria Holmqvist, Sara Stymne, and Lars Ahrenberg. 2011. Experiments with word alignment, normalization and clause reordering for SMT between English and German. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Chang Hu, Philip Resnik, Yakov Kronrod, Vladimir Eidelman, Olivia Buzek, and Benjamin B. Bederson. 2011. The value of monolingual crowdsourcing in a real-world translation scenario: Simulation using Haitian Creole emergency SMS messages. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Matthias Huck, Joern Wuebker, Christoph Schmidt, Markus Freitag, Stephan Peitz, Daniel Stein, Arnaud Dagnelies, Saab Mansour, Gregor Leusch, and Hermann Ney. 2011. The RWTH Aachen machine translation system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Maxim Khalilov and Khalil Sima'an. 2011. ILLC-UvA translation system for EMNLP-WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Philipp Koehn and Christof Monz. 2006. Manual and automatic evaluation of machine translation between European languages. In Proceedings of NAACL 2006 Workshop on Statistical Machine Translation, New York, New York. Google ScholarDigital Library
Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the ACL-2007 Demo and Poster Sessions, Prague, Czech Republic. Google ScholarDigital Library
Oliver Lacey-Hall. 2011. The guardian's poverty matters blog: How remote teams can help the rapid response to disasters, March.Google Scholar
J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics, 33:159--174.Google ScholarCross Ref
Gregor Leusch and Hermann Ney. 2009. Edit distances with block movements and error rate confidence estimates. Machine Translation, 23:129--140. Google ScholarDigital Library
Gregor Leusch, Markus Freitag, and Hermann Ney. 2011. The RWTH system combination system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
William Lewis, Robert Munro, and Stephan Vogel. 2011. Crisis MT: Developing a cookbook for MT in crisis situations. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
William D. Lewis. 2010. Haitian Creole: How to build and ship an MT engine from scratch in 4 days, 17hours, & 30 minutes. In Proceedings of EAMT 2010.Google Scholar
Zhifei Li, Chris Callison-Burch, Chris Dyer, Juri Ganitkevitch, Ann Irvine, Sanjeev Khudanpur, Lane Schwartz, Wren Thornton, Ziyuan Wang, Jonathan Weese, and Omar Zaidan. 2010. Joshua 2.0: A toolkit for parsing-based machine translation with syntax, semirings, discriminative training and other goodies. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, Uppsala, Sweden, July. Google ScholarDigital Library
Ding Liu and Daniel Gildea. 2005. Syntactic features for evaluation of machine translation. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 25--32.Google Scholar
Chang Liu, Daniel Dahlmeier, and Hwee Tou Ng. 2011. Better evaluation metrics lead to better machine translation. In Proceedings of EMNLP. Google ScholarDigital Library
Verónica López-Ludeña and Rubén San-Segundo. 2011. UPM system for the translation task. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Matouš Macháček and Ondřej Bojar. 2011. Approximating a deep-syntactic metric for MT evaluation and tuning. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
David Mareček, Rudolf Rosa, Petra Galuščáková, and Ondřej Bojar. 2011. Two-step translation with grammatical post-processing. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Robert Munro. 2010. Crowdsourced translation for emergency response in Haiti: the global collaboration of local knowledge. In Proceedings of the AMTA Workshop on Collaborative Crowdsourcing for Translation.Google Scholar
Douglas W. Oard and Franz Josef Och. 2003. Rapid-response machine translation for unexpected languages. In Proceedings of MT Summit IX.Google Scholar
Douglas W. Oard. 2003. The surprise language exercises. ACM Transactions on Asian Language Information Processing, 2(2):79--84. Google ScholarDigital Library
Franz Josef Och. 2003. Minimum error rate training for statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL-2003), Sapporo, Japan.Google ScholarDigital Library
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL-2002), Philadelphia, Pennsylvania. Google ScholarDigital Library
Kristen Parton, Joel Tetreault, Nitin Madnani, and Martin Chodorow. 2011. E-rating machine translation. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Martin Popel, David Mareček, Nathan Green, and Zdenêk Zabokrtský. 2011. Influence of parser choice on dependency-based MT. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Maja Popović, David Vilar, Eleftherios Avramidis, and Aljoscha Burchardt. 2011. Evaluation without references: IBM1 scores as evaluation metrics. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Maja Popović. 2011. Morphemes and POS tags for n-gram based evaluation metrics. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Marion Potet, Raphaël Rubino, Benjamin Lecouteux, Stéphane Huet, Laurent Besacier, Hervé Blanchon, and Fabrice Lefèvre. 2011. The LIGA (LIG/LIA) machine translation system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Mark Przybocki, Kay Peterson, and Sebastian Bronsart. 2008. Official results of the NIST 2008 "Metrics for MAchine TRanslation" challenge (Metrics-MATR08). In AMTA-2008 workshop on Metrics for Machine Translation, Honolulu, Hawaii.Google Scholar
Miguel Rios, Wilker Aziz, and Lucia Specia. 2011. TINE: A metric to assess MT adequacy. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Christian Rishøj and Anders Søgaard. 2011. Factored translation with unsupervised word clusters. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Antti-Veikko Rosti, Bing Zhang, Spyros Matsoukas, and Richard Schwartz. 2011. Expected BLEU training for graphs: BBN system description for WMT11 system combination task. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Víctor M. Sánchez-Cartagena, Felipe Sánchez-Martínez, and Juan Antonio Pérez-Ortiz. 2011. The Universitat d'Alacant hybrid machine translation system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Holger Schwenk, Patrik Lambert, Loïc Barrault, Christophe Servan, Sadaf Abdul-Rauf, Haithem Afli, and Kashif Shah. 2011. LIUM's SMT machine translation systems for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Rico Sennrich. 2011. The UZH system combination system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Biennial Conference of the Association for Machine Translation in the Americas (AMTA-2006), Cambridge, Massachusetts.Google Scholar
Xingyi Song and Trevor Cohn. 2011. Regression and ranking based optimisation for sentence level MT evaluation. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Lucia Specia, Dhwaj Raj, and Marco Turchi. 2010. Machine translation evaluation versus quality estimation. Machine Translation, 24(1):39--50. Google ScholarDigital Library
Sara Stymne. 2011. Spell checking techniques for replacement of unknown words and data cleaning for Haitian Creole SMS translation. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Joel Tetreault and Martin Chodorow. 2008. The ups and downs of preposition error detection. In Proceedings of COLING, Manchester, UK. Google ScholarDigital Library
Jonathan Weese, Juri Ganitkevitch, Chris Callison-Burch, Matt Post, and Adam Lopez. 2011. Joshua 3.0: Syntax-based machine translation with the Thrax grammar extractor. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Eric Wehrli, Luka Nerima, and Yves Scherrer. 2009. Deep linguistic multilingual translation and bilingual dictionaries. In Proceedings of the Fourth Workshop on Statistical Machine Translation, pages 90--94. Google ScholarDigital Library
Daguang Xu, Yuan Cao, and Damianos Karakos. 2011a. Description of the JHU system combination scheme for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Jia Xu, Hans Uszkoreit, Casey Kennington, David Vilar, and Xiaojun Zhang. 2011b. DFKI hybrid machine translation system for WMT 2011 - on the integration of SMT and RBMT. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Omar F. Zaidan. 2009. Z-MERT: A fully configurable open source tool for minimum error rate training of machine translation systems. The Prague Bulletin of Mathematical Linguistics, 91:79--88.Google ScholarCross Ref
Francisco Zamora-Martinez and Maria Jose Castro-Bleda. 2011. CEU-UPV English-Spanish system for WMT11. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
Daniel Zeman. 2011. Hierarchical phrase-based MT at the Charles University for the WMT 2011 shared task. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library

Findings of the 2011 Workshop on Statistical Machine Translation

Recommendations

Findings of the 2012 workshop on statistical machine translation
WMT '12: Proceedings of the Seventh Workshop on Statistical Machine Translation

This paper presents the results of the WMT12 shared tasks, which included a translation task, a task for machine translation evaluation metrics, and a task for run-time estimation of machine translation quality. We conducted a large-scale manual ...
Read More
N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination
EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics

In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven ...
Read More
Linguistically annotated BTG for statistical machine translation
COLING '08: Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1

Bracketing Transduction Grammar (BTG) is a natural choice for effective integration of desired linguistic knowledge into statistical machine translation (SMT). In this paper, we propose a Linguistically Annotated BTG (LABTG) for SMT. It conveys ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WMT '11: Proceedings of the Sixth Workshop on Statistical Machine Translation
July 2011
575 pages
ISBN:9781937284121
Program Chairs:
Chris Callison-Burch
Johns Hopkins University
,
Philipp Koehn
University of Edinburgh
,
Christof Monz
University of Amsterdam
,
Omar F. Zaidan
Johns Hopkins University
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 30 July 2011
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 31
  Total Citations
  View Citations
- 1,429
  Total Downloads
- Downloads (Last 12 months)198
- Downloads (Last 6 weeks)42
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Findings of the 2011 Workshop on Statistical Machine Translation

WMT '11: Proceedings of the Sixth Workshop on Statistical Machine Translation

ABSTRACT

References

Cited By

Recommendations

Findings of the 2012 workshop on statistical machine translation

N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination

Linguistically annotated BTG for statistical machine translation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Findings of the 2011 Workshop on Statistical Machine Translation

WMT '11: Proceedings of the Sixth Workshop on Statistical Machine Translation

ABSTRACT

References

Cited By

Recommendations

Findings of the 2012 workshop on statistical machine translation

N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination

Linguistically annotated BTG for statistical machine translation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media