skip to main content
10.5555/2132960.2132964dlproceedingsArticle/Chapter ViewAbstractPublication PageswmtConference Proceedingsconference-collections
research-article
Free Access

Findings of the 2011 Workshop on Statistical Machine Translation

Published:30 July 2011Publication History

ABSTRACT

This paper presents the results of the WMT11 shared tasks, which included a translation task, a system combination task, and a task for machine translation evaluation metrics. We conducted a large-scale manual evaluation of 148 machine translation systems and 41 system combination entries. We used the ranking of these systems to measure how strongly automatic metrics correlate with human judgments of translation quality for 21 evaluation metrics. This year featured a Haitian Creole to English task translating SMS messages sent to an emergency response service in the aftermath of the Haitian earthquake. We also conducted a pilot 'tunable metrics' task to test whether optimizing a fixed system to different metrics would result in perceptibly different translation quality.

References

  1. Vera Aleksic and Gregor Thurmair. 2011. Personal Translator at WMT2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alexandre Allauzen, Hélène Bonneau-Maynard, Hai-Son Le, Aurélien Max, Guillaume Wisniewski, François Yvon, Gilles Adda, Josep Maria Crego, Adrien Lardilleux, Thomas Lavergne, and Artem Sokolov. 2011. LIMSI @ WMT11. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Yigal Attali and Jill Burstein. 2006. Automated essay scoring with e-rater v.2.0. Journal of Technology, Learning, and Assessment, 4(3):159--174.Google ScholarGoogle Scholar
  4. Eleftherios Avramidis, Maja Popović, David Vilar, and Aljoscha Burchardt. 2011. Evaluate with confidence estimation: Machine ranking of translation outputs using grammatical features. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Wilker Aziz, Miguel Rios, and Lucia Specia. 2011. Shallow semantic trees for SMT. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Kathryn Baker, Michael Bloodgood, Chris Callison-Burch, Bonnie Dorr, Scott Miller, Christine Piatko, Nathaniel W. Filardo, and Lori Levin. 2010. Semantically-informed syntactic machine translation: A tree-grafting approach. In Proceedings of AMTA.Google ScholarGoogle Scholar
  7. Loïc Barrault. 2011. MANY improvements for WMT'11. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ergun Bicici and Deniz Yuret. 2011. RegMT system for machine translation, system combination, and evaluation. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ondřej Bojar and Aleš Tamchyna. 2011. Improving translation model by monolingual data. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chris Callison-Burch, Cameron Fordyce, Philipp Koehn, Christof Monz, and Josh Schroeder. 2007. (Meta-) evaluation of machine translation. In Proceedings of the Second Workshop on Statistical Machine Translation (WMT07), Prague, Czech Republic. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chris Callison-Burch, Cameron Fordyce, Philipp Koehn, Christof Monz, and Josh Schroeder. 2008. Further meta-evaluation of machine translation. In Proceedings of the Third Workshop on Statistical Machine Translation (WMT08), Colmbus, Ohio. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chris Callison-Burch, Philipp Koehn, Christof Monz, and Josh Schroeder. 2009. Findings of the 2009 workshop on statistical machine translation. In Proceedings of the Fourth Workshop on Statistical Machine Translation (WMT09), Athens, Greece. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chris Callison-Burch, Philipp Koehn, Christof Monz, Kay Peterson, Mark Przybocki, and Omar F. Zaidan. 2010. Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In Proceedings of the Fourth Workshop on Statistical Machine Translation (WMT10), Uppsala, Sweden. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Boxing Chen and Roland Kuhn. 2011. Amber: A modified bleu, enhanced ranking metric. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurment, 20(1):37--46.Google ScholarGoogle ScholarCross RefCross Ref
  16. Antonio M. Corbí-Bellot, Mikel L. Forcada, Sergio Ortiz-Rojas, Juan Antonio Pérez-Ortiz, Gema Ramírez-Sánchez, Felipe Sánchez-Martínez, Iñaki Alegria, Aingeru Mayor, and Kepa Sarasola. 2005. An open-source shallow-transfer machine translation engine for the romance languages of Spain. In Proceedings of the European Association for Machine Translation, pages 79--86.Google ScholarGoogle Scholar
  17. Marta R. Costa-jussà and Rafael E. Banchs. 2011. The BM-I2R Haitian-Créole-to-English translation system description for the WMT 2011 evaluation campaign. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Daniel Dahlmeier, Chang Liu, and Hwee Tou Ng. 2011. TESLA at WMT 2011: Translation evaluation and tunable metric. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Michael Denkowski and Alon Lavie. 2011a. Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Michael Denkowski and Alon Lavie. 2011b. METEOR-Tuned Phrase-Based SMT: CMU French-English and Haitian-English Systems for WMT 2011. Technical Report CMU-LTI-11-011, Language Technologies Institute, Carnegie Mellon University.Google ScholarGoogle Scholar
  21. Chris Dyer, Kevin Gimpel, Jonathan H. Clark, and Noah A. Smith. 2011. The CMU-ARK German-English translation system. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Vladimir Eidelman, Kristy Hollingshead, and Philip Resnik. 2011. Noisy SMS machine translation in low-density languages. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Christian Federmann and Sabine Hunsicker. 2011. Stochastic parse tree selection for an existing RBMT system. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Robert Frederking, Alexander Rudnicky, and Christopher Hogan. 1997. Interactive speech translation in the DIPLOMAT project. In Proceedings of the ACL-1997 Workshop on Spoken Language Translation.Google ScholarGoogle Scholar
  25. Markus Freitag, Gregor Leusch, Joern Wuebker, Stephan Peitz, Hermann Ney, Teresa Herrmann, Jan Niehues, Alex Waibel, Alexandre Allauzen, Gilles Adda, Josep Maria Crego, Bianka Buschbeck, Tonio Wandmacher, and Jean Senellart. 2011. Joint WMT submission of the QUAERO project. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yoko Futagi, Paul Deane, Martin Chodorow, and Joel Tetreault. 2008. A computational approach to detecting collocation errors in the writing of non-native speakers of English. Computer Assisted Language Learning Journal.Google ScholarGoogle Scholar
  27. Jesús González-Rubio and Francisco Casacuberta. 2011. The UPV-PRHLT combination system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Greg Hanneman and Alon Lavie. 2011. CMU syntax-based machine translation at WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Christian Hardmeier, Jörg Tiedemann, Markus Saers, Marcello Federico, and Mathur Prashant. 2011. The Uppsala-FBK systems at WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kenneth Heafield and Alon Lavie. 2011. CMU system combination in WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Teresa Herrmann, Mohammed Mediani, Jan Niehues, and Alex Waibel. 2011. The Karlsruhe Institute of Technology translation systems for the WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sanjika Hewavitharana, Nguyen Bach, Qin Gao, Vamshi Ambati, and Stephan Vogel. 2011. CMU Haitian Creole-English translation system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Maria Holmqvist, Sara Stymne, and Lars Ahrenberg. 2011. Experiments with word alignment, normalization and clause reordering for SMT between English and German. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Chang Hu, Philip Resnik, Yakov Kronrod, Vladimir Eidelman, Olivia Buzek, and Benjamin B. Bederson. 2011. The value of monolingual crowdsourcing in a real-world translation scenario: Simulation using Haitian Creole emergency SMS messages. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Matthias Huck, Joern Wuebker, Christoph Schmidt, Markus Freitag, Stephan Peitz, Daniel Stein, Arnaud Dagnelies, Saab Mansour, Gregor Leusch, and Hermann Ney. 2011. The RWTH Aachen machine translation system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Maxim Khalilov and Khalil Sima'an. 2011. ILLC-UvA translation system for EMNLP-WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Philipp Koehn and Christof Monz. 2006. Manual and automatic evaluation of machine translation between European languages. In Proceedings of NAACL 2006 Workshop on Statistical Machine Translation, New York, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the ACL-2007 Demo and Poster Sessions, Prague, Czech Republic. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Oliver Lacey-Hall. 2011. The guardian's poverty matters blog: How remote teams can help the rapid response to disasters, March.Google ScholarGoogle Scholar
  40. J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics, 33:159--174.Google ScholarGoogle ScholarCross RefCross Ref
  41. Gregor Leusch and Hermann Ney. 2009. Edit distances with block movements and error rate confidence estimates. Machine Translation, 23:129--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Gregor Leusch, Markus Freitag, and Hermann Ney. 2011. The RWTH system combination system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. William Lewis, Robert Munro, and Stephan Vogel. 2011. Crisis MT: Developing a cookbook for MT in crisis situations. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. William D. Lewis. 2010. Haitian Creole: How to build and ship an MT engine from scratch in 4 days, 17hours, & 30 minutes. In Proceedings of EAMT 2010.Google ScholarGoogle Scholar
  45. Zhifei Li, Chris Callison-Burch, Chris Dyer, Juri Ganitkevitch, Ann Irvine, Sanjeev Khudanpur, Lane Schwartz, Wren Thornton, Ziyuan Wang, Jonathan Weese, and Omar Zaidan. 2010. Joshua 2.0: A toolkit for parsing-based machine translation with syntax, semirings, discriminative training and other goodies. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, Uppsala, Sweden, July. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Ding Liu and Daniel Gildea. 2005. Syntactic features for evaluation of machine translation. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 25--32.Google ScholarGoogle Scholar
  47. Chang Liu, Daniel Dahlmeier, and Hwee Tou Ng. 2011. Better evaluation metrics lead to better machine translation. In Proceedings of EMNLP. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Verónica López-Ludeña and Rubén San-Segundo. 2011. UPM system for the translation task. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Matouš Macháček and Ondřej Bojar. 2011. Approximating a deep-syntactic metric for MT evaluation and tuning. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. David Mareček, Rudolf Rosa, Petra Galuščáková, and Ondřej Bojar. 2011. Two-step translation with grammatical post-processing. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Robert Munro. 2010. Crowdsourced translation for emergency response in Haiti: the global collaboration of local knowledge. In Proceedings of the AMTA Workshop on Collaborative Crowdsourcing for Translation.Google ScholarGoogle Scholar
  52. Douglas W. Oard and Franz Josef Och. 2003. Rapid-response machine translation for unexpected languages. In Proceedings of MT Summit IX.Google ScholarGoogle Scholar
  53. Douglas W. Oard. 2003. The surprise language exercises. ACM Transactions on Asian Language Information Processing, 2(2):79--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Franz Josef Och. 2003. Minimum error rate training for statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL-2003), Sapporo, Japan.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL-2002), Philadelphia, Pennsylvania. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Kristen Parton, Joel Tetreault, Nitin Madnani, and Martin Chodorow. 2011. E-rating machine translation. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Martin Popel, David Mareček, Nathan Green, and Zdenêk Zabokrtský. 2011. Influence of parser choice on dependency-based MT. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Maja Popović, David Vilar, Eleftherios Avramidis, and Aljoscha Burchardt. 2011. Evaluation without references: IBM1 scores as evaluation metrics. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Maja Popović. 2011. Morphemes and POS tags for n-gram based evaluation metrics. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Marion Potet, Raphaël Rubino, Benjamin Lecouteux, Stéphane Huet, Laurent Besacier, Hervé Blanchon, and Fabrice Lefèvre. 2011. The LIGA (LIG/LIA) machine translation system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Mark Przybocki, Kay Peterson, and Sebastian Bronsart. 2008. Official results of the NIST 2008 "Metrics for MAchine TRanslation" challenge (Metrics-MATR08). In AMTA-2008 workshop on Metrics for Machine Translation, Honolulu, Hawaii.Google ScholarGoogle Scholar
  62. Miguel Rios, Wilker Aziz, and Lucia Specia. 2011. TINE: A metric to assess MT adequacy. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Christian Rishøj and Anders Søgaard. 2011. Factored translation with unsupervised word clusters. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Antti-Veikko Rosti, Bing Zhang, Spyros Matsoukas, and Richard Schwartz. 2011. Expected BLEU training for graphs: BBN system description for WMT11 system combination task. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Víctor M. Sánchez-Cartagena, Felipe Sánchez-Martínez, and Juan Antonio Pérez-Ortiz. 2011. The Universitat d'Alacant hybrid machine translation system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Holger Schwenk, Patrik Lambert, Loïc Barrault, Christophe Servan, Sadaf Abdul-Rauf, Haithem Afli, and Kashif Shah. 2011. LIUM's SMT machine translation systems for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Rico Sennrich. 2011. The UZH system combination system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Biennial Conference of the Association for Machine Translation in the Americas (AMTA-2006), Cambridge, Massachusetts.Google ScholarGoogle Scholar
  69. Xingyi Song and Trevor Cohn. 2011. Regression and ranking based optimisation for sentence level MT evaluation. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Lucia Specia, Dhwaj Raj, and Marco Turchi. 2010. Machine translation evaluation versus quality estimation. Machine Translation, 24(1):39--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Sara Stymne. 2011. Spell checking techniques for replacement of unknown words and data cleaning for Haitian Creole SMS translation. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Joel Tetreault and Martin Chodorow. 2008. The ups and downs of preposition error detection. In Proceedings of COLING, Manchester, UK. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Jonathan Weese, Juri Ganitkevitch, Chris Callison-Burch, Matt Post, and Adam Lopez. 2011. Joshua 3.0: Syntax-based machine translation with the Thrax grammar extractor. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Eric Wehrli, Luka Nerima, and Yves Scherrer. 2009. Deep linguistic multilingual translation and bilingual dictionaries. In Proceedings of the Fourth Workshop on Statistical Machine Translation, pages 90--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Daguang Xu, Yuan Cao, and Damianos Karakos. 2011a. Description of the JHU system combination scheme for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Jia Xu, Hans Uszkoreit, Casey Kennington, David Vilar, and Xiaojun Zhang. 2011b. DFKI hybrid machine translation system for WMT 2011 - on the integration of SMT and RBMT. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Omar F. Zaidan. 2009. Z-MERT: A fully configurable open source tool for minimum error rate training of machine translation systems. The Prague Bulletin of Mathematical Linguistics, 91:79--88.Google ScholarGoogle ScholarCross RefCross Ref
  78. Francisco Zamora-Martinez and Maria Jose Castro-Bleda. 2011. CEU-UPV English-Spanish system for WMT11. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Daniel Zeman. 2011. Hierarchical phrase-based MT at the Charles University for the WMT 2011 shared task. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Findings of the 2011 Workshop on Statistical Machine Translation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image DL Hosted proceedings
          WMT '11: Proceedings of the Sixth Workshop on Statistical Machine Translation
          July 2011
          575 pages
          ISBN:9781937284121

          Publisher

          Association for Computational Linguistics

          United States

          Publication History

          • Published: 30 July 2011

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader