ABSTRACT
This paper presents the results of the WMT11 shared tasks, which included a translation task, a system combination task, and a task for machine translation evaluation metrics. We conducted a large-scale manual evaluation of 148 machine translation systems and 41 system combination entries. We used the ranking of these systems to measure how strongly automatic metrics correlate with human judgments of translation quality for 21 evaluation metrics. This year featured a Haitian Creole to English task translating SMS messages sent to an emergency response service in the aftermath of the Haitian earthquake. We also conducted a pilot 'tunable metrics' task to test whether optimizing a fixed system to different metrics would result in perceptibly different translation quality.
- Vera Aleksic and Gregor Thurmair. 2011. Personal Translator at WMT2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Alexandre Allauzen, Hélène Bonneau-Maynard, Hai-Son Le, Aurélien Max, Guillaume Wisniewski, François Yvon, Gilles Adda, Josep Maria Crego, Adrien Lardilleux, Thomas Lavergne, and Artem Sokolov. 2011. LIMSI @ WMT11. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Yigal Attali and Jill Burstein. 2006. Automated essay scoring with e-rater v.2.0. Journal of Technology, Learning, and Assessment, 4(3):159--174.Google Scholar
- Eleftherios Avramidis, Maja Popović, David Vilar, and Aljoscha Burchardt. 2011. Evaluate with confidence estimation: Machine ranking of translation outputs using grammatical features. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Wilker Aziz, Miguel Rios, and Lucia Specia. 2011. Shallow semantic trees for SMT. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Kathryn Baker, Michael Bloodgood, Chris Callison-Burch, Bonnie Dorr, Scott Miller, Christine Piatko, Nathaniel W. Filardo, and Lori Levin. 2010. Semantically-informed syntactic machine translation: A tree-grafting approach. In Proceedings of AMTA.Google Scholar
- Loïc Barrault. 2011. MANY improvements for WMT'11. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Ergun Bicici and Deniz Yuret. 2011. RegMT system for machine translation, system combination, and evaluation. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Ondřej Bojar and Aleš Tamchyna. 2011. Improving translation model by monolingual data. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Chris Callison-Burch, Cameron Fordyce, Philipp Koehn, Christof Monz, and Josh Schroeder. 2007. (Meta-) evaluation of machine translation. In Proceedings of the Second Workshop on Statistical Machine Translation (WMT07), Prague, Czech Republic. Google ScholarDigital Library
- Chris Callison-Burch, Cameron Fordyce, Philipp Koehn, Christof Monz, and Josh Schroeder. 2008. Further meta-evaluation of machine translation. In Proceedings of the Third Workshop on Statistical Machine Translation (WMT08), Colmbus, Ohio. Google ScholarDigital Library
- Chris Callison-Burch, Philipp Koehn, Christof Monz, and Josh Schroeder. 2009. Findings of the 2009 workshop on statistical machine translation. In Proceedings of the Fourth Workshop on Statistical Machine Translation (WMT09), Athens, Greece. Google ScholarDigital Library
- Chris Callison-Burch, Philipp Koehn, Christof Monz, Kay Peterson, Mark Przybocki, and Omar F. Zaidan. 2010. Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In Proceedings of the Fourth Workshop on Statistical Machine Translation (WMT10), Uppsala, Sweden. Google ScholarDigital Library
- Boxing Chen and Roland Kuhn. 2011. Amber: A modified bleu, enhanced ranking metric. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurment, 20(1):37--46.Google ScholarCross Ref
- Antonio M. Corbí-Bellot, Mikel L. Forcada, Sergio Ortiz-Rojas, Juan Antonio Pérez-Ortiz, Gema Ramírez-Sánchez, Felipe Sánchez-Martínez, Iñaki Alegria, Aingeru Mayor, and Kepa Sarasola. 2005. An open-source shallow-transfer machine translation engine for the romance languages of Spain. In Proceedings of the European Association for Machine Translation, pages 79--86.Google Scholar
- Marta R. Costa-jussà and Rafael E. Banchs. 2011. The BM-I2R Haitian-Créole-to-English translation system description for the WMT 2011 evaluation campaign. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Daniel Dahlmeier, Chang Liu, and Hwee Tou Ng. 2011. TESLA at WMT 2011: Translation evaluation and tunable metric. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Michael Denkowski and Alon Lavie. 2011a. Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Michael Denkowski and Alon Lavie. 2011b. METEOR-Tuned Phrase-Based SMT: CMU French-English and Haitian-English Systems for WMT 2011. Technical Report CMU-LTI-11-011, Language Technologies Institute, Carnegie Mellon University.Google Scholar
- Chris Dyer, Kevin Gimpel, Jonathan H. Clark, and Noah A. Smith. 2011. The CMU-ARK German-English translation system. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Vladimir Eidelman, Kristy Hollingshead, and Philip Resnik. 2011. Noisy SMS machine translation in low-density languages. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Christian Federmann and Sabine Hunsicker. 2011. Stochastic parse tree selection for an existing RBMT system. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Robert Frederking, Alexander Rudnicky, and Christopher Hogan. 1997. Interactive speech translation in the DIPLOMAT project. In Proceedings of the ACL-1997 Workshop on Spoken Language Translation.Google Scholar
- Markus Freitag, Gregor Leusch, Joern Wuebker, Stephan Peitz, Hermann Ney, Teresa Herrmann, Jan Niehues, Alex Waibel, Alexandre Allauzen, Gilles Adda, Josep Maria Crego, Bianka Buschbeck, Tonio Wandmacher, and Jean Senellart. 2011. Joint WMT submission of the QUAERO project. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Yoko Futagi, Paul Deane, Martin Chodorow, and Joel Tetreault. 2008. A computational approach to detecting collocation errors in the writing of non-native speakers of English. Computer Assisted Language Learning Journal.Google Scholar
- Jesús González-Rubio and Francisco Casacuberta. 2011. The UPV-PRHLT combination system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Greg Hanneman and Alon Lavie. 2011. CMU syntax-based machine translation at WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Christian Hardmeier, Jörg Tiedemann, Markus Saers, Marcello Federico, and Mathur Prashant. 2011. The Uppsala-FBK systems at WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Kenneth Heafield and Alon Lavie. 2011. CMU system combination in WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Teresa Herrmann, Mohammed Mediani, Jan Niehues, and Alex Waibel. 2011. The Karlsruhe Institute of Technology translation systems for the WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Sanjika Hewavitharana, Nguyen Bach, Qin Gao, Vamshi Ambati, and Stephan Vogel. 2011. CMU Haitian Creole-English translation system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Maria Holmqvist, Sara Stymne, and Lars Ahrenberg. 2011. Experiments with word alignment, normalization and clause reordering for SMT between English and German. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Chang Hu, Philip Resnik, Yakov Kronrod, Vladimir Eidelman, Olivia Buzek, and Benjamin B. Bederson. 2011. The value of monolingual crowdsourcing in a real-world translation scenario: Simulation using Haitian Creole emergency SMS messages. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Matthias Huck, Joern Wuebker, Christoph Schmidt, Markus Freitag, Stephan Peitz, Daniel Stein, Arnaud Dagnelies, Saab Mansour, Gregor Leusch, and Hermann Ney. 2011. The RWTH Aachen machine translation system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Maxim Khalilov and Khalil Sima'an. 2011. ILLC-UvA translation system for EMNLP-WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Philipp Koehn and Christof Monz. 2006. Manual and automatic evaluation of machine translation between European languages. In Proceedings of NAACL 2006 Workshop on Statistical Machine Translation, New York, New York. Google ScholarDigital Library
- Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the ACL-2007 Demo and Poster Sessions, Prague, Czech Republic. Google ScholarDigital Library
- Oliver Lacey-Hall. 2011. The guardian's poverty matters blog: How remote teams can help the rapid response to disasters, March.Google Scholar
- J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics, 33:159--174.Google ScholarCross Ref
- Gregor Leusch and Hermann Ney. 2009. Edit distances with block movements and error rate confidence estimates. Machine Translation, 23:129--140. Google ScholarDigital Library
- Gregor Leusch, Markus Freitag, and Hermann Ney. 2011. The RWTH system combination system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- William Lewis, Robert Munro, and Stephan Vogel. 2011. Crisis MT: Developing a cookbook for MT in crisis situations. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- William D. Lewis. 2010. Haitian Creole: How to build and ship an MT engine from scratch in 4 days, 17hours, & 30 minutes. In Proceedings of EAMT 2010.Google Scholar
- Zhifei Li, Chris Callison-Burch, Chris Dyer, Juri Ganitkevitch, Ann Irvine, Sanjeev Khudanpur, Lane Schwartz, Wren Thornton, Ziyuan Wang, Jonathan Weese, and Omar Zaidan. 2010. Joshua 2.0: A toolkit for parsing-based machine translation with syntax, semirings, discriminative training and other goodies. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, Uppsala, Sweden, July. Google ScholarDigital Library
- Ding Liu and Daniel Gildea. 2005. Syntactic features for evaluation of machine translation. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 25--32.Google Scholar
- Chang Liu, Daniel Dahlmeier, and Hwee Tou Ng. 2011. Better evaluation metrics lead to better machine translation. In Proceedings of EMNLP. Google ScholarDigital Library
- Verónica López-Ludeña and Rubén San-Segundo. 2011. UPM system for the translation task. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Matouš Macháček and Ondřej Bojar. 2011. Approximating a deep-syntactic metric for MT evaluation and tuning. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- David Mareček, Rudolf Rosa, Petra Galuščáková, and Ondřej Bojar. 2011. Two-step translation with grammatical post-processing. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Robert Munro. 2010. Crowdsourced translation for emergency response in Haiti: the global collaboration of local knowledge. In Proceedings of the AMTA Workshop on Collaborative Crowdsourcing for Translation.Google Scholar
- Douglas W. Oard and Franz Josef Och. 2003. Rapid-response machine translation for unexpected languages. In Proceedings of MT Summit IX.Google Scholar
- Douglas W. Oard. 2003. The surprise language exercises. ACM Transactions on Asian Language Information Processing, 2(2):79--84. Google ScholarDigital Library
- Franz Josef Och. 2003. Minimum error rate training for statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL-2003), Sapporo, Japan.Google ScholarDigital Library
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL-2002), Philadelphia, Pennsylvania. Google ScholarDigital Library
- Kristen Parton, Joel Tetreault, Nitin Madnani, and Martin Chodorow. 2011. E-rating machine translation. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Martin Popel, David Mareček, Nathan Green, and Zdenêk Zabokrtský. 2011. Influence of parser choice on dependency-based MT. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Maja Popović, David Vilar, Eleftherios Avramidis, and Aljoscha Burchardt. 2011. Evaluation without references: IBM1 scores as evaluation metrics. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Maja Popović. 2011. Morphemes and POS tags for n-gram based evaluation metrics. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Marion Potet, Raphaël Rubino, Benjamin Lecouteux, Stéphane Huet, Laurent Besacier, Hervé Blanchon, and Fabrice Lefèvre. 2011. The LIGA (LIG/LIA) machine translation system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Mark Przybocki, Kay Peterson, and Sebastian Bronsart. 2008. Official results of the NIST 2008 "Metrics for MAchine TRanslation" challenge (Metrics-MATR08). In AMTA-2008 workshop on Metrics for Machine Translation, Honolulu, Hawaii.Google Scholar
- Miguel Rios, Wilker Aziz, and Lucia Specia. 2011. TINE: A metric to assess MT adequacy. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Christian Rishøj and Anders Søgaard. 2011. Factored translation with unsupervised word clusters. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Antti-Veikko Rosti, Bing Zhang, Spyros Matsoukas, and Richard Schwartz. 2011. Expected BLEU training for graphs: BBN system description for WMT11 system combination task. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Víctor M. Sánchez-Cartagena, Felipe Sánchez-Martínez, and Juan Antonio Pérez-Ortiz. 2011. The Universitat d'Alacant hybrid machine translation system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Holger Schwenk, Patrik Lambert, Loïc Barrault, Christophe Servan, Sadaf Abdul-Rauf, Haithem Afli, and Kashif Shah. 2011. LIUM's SMT machine translation systems for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Rico Sennrich. 2011. The UZH system combination system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Biennial Conference of the Association for Machine Translation in the Americas (AMTA-2006), Cambridge, Massachusetts.Google Scholar
- Xingyi Song and Trevor Cohn. 2011. Regression and ranking based optimisation for sentence level MT evaluation. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Lucia Specia, Dhwaj Raj, and Marco Turchi. 2010. Machine translation evaluation versus quality estimation. Machine Translation, 24(1):39--50. Google ScholarDigital Library
- Sara Stymne. 2011. Spell checking techniques for replacement of unknown words and data cleaning for Haitian Creole SMS translation. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Joel Tetreault and Martin Chodorow. 2008. The ups and downs of preposition error detection. In Proceedings of COLING, Manchester, UK. Google ScholarDigital Library
- Jonathan Weese, Juri Ganitkevitch, Chris Callison-Burch, Matt Post, and Adam Lopez. 2011. Joshua 3.0: Syntax-based machine translation with the Thrax grammar extractor. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Eric Wehrli, Luka Nerima, and Yves Scherrer. 2009. Deep linguistic multilingual translation and bilingual dictionaries. In Proceedings of the Fourth Workshop on Statistical Machine Translation, pages 90--94. Google ScholarDigital Library
- Daguang Xu, Yuan Cao, and Damianos Karakos. 2011a. Description of the JHU system combination scheme for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Jia Xu, Hans Uszkoreit, Casey Kennington, David Vilar, and Xiaojun Zhang. 2011b. DFKI hybrid machine translation system for WMT 2011 - on the integration of SMT and RBMT. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Omar F. Zaidan. 2009. Z-MERT: A fully configurable open source tool for minimum error rate training of machine translation systems. The Prague Bulletin of Mathematical Linguistics, 91:79--88.Google ScholarCross Ref
- Francisco Zamora-Martinez and Maria Jose Castro-Bleda. 2011. CEU-UPV English-Spanish system for WMT11. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Daniel Zeman. 2011. Hierarchical phrase-based MT at the Charles University for the WMT 2011 shared task. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Findings of the 2011 Workshop on Statistical Machine Translation
Recommendations
Findings of the 2012 workshop on statistical machine translation
WMT '12: Proceedings of the Seventh Workshop on Statistical Machine TranslationThis paper presents the results of the WMT12 shared tasks, which included a translation task, a task for machine translation evaluation metrics, and a task for run-time estimation of machine translation quality. We conducted a large-scale manual ...
N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination
EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational LinguisticsIn this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven ...
Linguistically annotated BTG for statistical machine translation
COLING '08: Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1Bracketing Transduction Grammar (BTG) is a natural choice for effective integration of desired linguistic knowledge into statistical machine translation (SMT). In this paper, we propose a Linguistically Annotated BTG (LABTG) for SMT. It conveys ...
Comments