ABSTRACT
Part-of-speech language modeling is commonly used as a component in statistical machine translation systems, but there is mixed evidence that its usage leads to significant improvements. We argue that its limited effectiveness is due to the lack of lexicalization. We introduce a new approach that builds a separate local language model for each word and part-of-speech pair. The resulting models lead to more context-sensitive probability distributions and we also exploit the fact that different local models are used to estimate the language model probability of each word during decoding. Our approach is evaluated for Arabic- and Chinese-to-English translation. We show that it leads to statistically significant improvements for multiple test sets and also across different genres, when compared against a competitive baseline and a system using a part-of-speech model.
- Yaser Al-Onaizan and Kishore Papineni. 2006. Distortion models for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 529--536. Google ScholarDigital Library
- Jeff A. Bilmes and Katrin Kirchhoff. 2003. Factored language models and generalized parallel backoff. In Proceedings of the the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pages 4--6. Google ScholarDigital Library
- Alexandra Birch, Miles Osborne, and Philipp Koehn. 2007. CCG supertags in factored statistical machine translation. In Proceedings of the Second Workshop on Statistical Machine Translation, pages 9--16. Google ScholarDigital Library
- Hélène Bonneau-Maynard, Alexandre Allauzen, Daniel Déchelotte, and Holger Schwenk. 2007. Combining morphosyntactic enriched representation with n-best reranking in statistical translation. In Proceedings of the NAACL-HLT Workshop on Syntax and Structure in Statistical Translation, pages 65--71. Google ScholarDigital Library
- Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, and Jeffrey Dean. 2007. Large language models in machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 858--867.Google Scholar
- Eugene Charniak. 2000. A maximum-entropy-inspired parser. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference, pages 132--139. Google ScholarDigital Library
- Stanley F. Chen and Joshua Goodman. 1999. An empirical study of smoothing techniques for language modeling. Computer Speech and Language, 13(4):359--393.Google ScholarDigital Library
- David Chiang, Kevin Knight, and Wei Wang. 2009. 11,001 new features for statistical machine translation. In Proceedings of the North American Chapter of the Association for Computational Linguistics, pages 218--226. Google ScholarDigital Library
- Michael Collins, Brian Roark, and Murat Saraclar. 2005. Discriminative syntactic language modeling for speech recognition. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 507--514. Google ScholarDigital Library
- Saša Hasan, Oliver Bender, and Hermann Ney. 2006. Reranking translation hypotheses using structural properties. In Proceedings of the EACL Workshop on Learning Structured Information in Natural Language Applications, pages 41--48.Google Scholar
- Peter Heeman. 1998. POS tagging versus classes in language modeling. In Proceedings of the Sixth Workshop on Very Large Corpora, pages 179--187.Google Scholar
- Katrin Kirchhoff and Mei Yang. 2005. Improved language modeling for statistical machine translation. In Proceedings of the ACL Workshop on Building and Using Parallel Texts, pages 125--128. Google ScholarDigital Library
- Philipp Koehn and Hieu Hoang. 2007. Factored translation models. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 868--876.Google Scholar
- Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pages 48--54. Google ScholarDigital Library
- Philipp Koehn, Amittai Axelrod, Alexandra Birch Mayne, Chris Callison-Burch, Miles Osborne, and David Talbot. 2005. Edinburgh system description for the 2005 IWSLT speech translation evaluation. In Proceedings of the International Workshop on Spoken Language Translation.Google Scholar
- Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pages 177--180. Google ScholarDigital Library
- Philipp Koehn, Abhishek Arun, and Hieu Hoang. 2008. Towards better machine translation quality for the german--english language pairs. In Proceedings of the Third Workshop on Statistical Machine Translation, pages 139--142. Google ScholarDigital Library
- Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 388--395.Google Scholar
- Roland Kuhn. 1988. Speech recognition and the frequency of recently used words: a modified Markov model for natural language. In Proceedings of the 12th conference on Computational Linguistics, pages 348--350. Google ScholarDigital Library
- Zhifei Li and Sanjeev Khudanpur. 2008. Large-scale discriminative n-gram language models for statisti- cal machine translation. In Proceedings of AMTA, pages 133--142.Google Scholar
- Christopher D. Manning. 1993. Automatic acquisition of a large subcategorization dictionary from corpora. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, pages 235--242. Google ScholarDigital Library
- Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of english: the penn treebank. Computational Linguistics, 19:313--330. Google ScholarDigital Library
- Eric W. Noreen. 1989. Computer Intensive Methods for Testing Hypotheses. An Introduction. Wiley-Interscience.Google Scholar
- Franz Josef Och, Daniel Gildea, Sanjeev Khudanpur, Anoop Sarkar, Kenji Yamada, ALex Fraser, Shankar Kumar, Libin Shen, David Smith, Katherine Eng, Viren Jain, Zhen Jin, and Dragomir Radev. 2004. A smorgasbord of features for statistical machine translation. In Proceedings of the 2004 Meeting of the North American chapter of the Association for Computational Linguistics, pages 161--168.Google Scholar
- Franz-Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL), pages 160--167. Google ScholarDigital Library
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2001. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL 2001), pages 311--318. Google ScholarDigital Library
- Matt Post and Daniel Gildea. 2008. Parsers as language models for statistical machine translation. In Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas, pages 172--181.Google Scholar
- Stefan Riezler and John T. Maxwell. 2005. On some pitfalls in automatic evaluation and significance testing for MT. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 57--64.Google Scholar
- Holger Schwenk. 2007. Continuous space language models. Computer Speech and Language, 21:492--518. Google ScholarDigital Library
- Libin Shen, Jinxi Xu, Bing Zhang, Spyros Matsoukas, and Ralph Weischedel. 2009. Effective use of linguistic and contextual information for statistical machine translation. In Proceedings of Empirical Methods in Natural Language Processing, pages 72--80. Google ScholarDigital Library
- Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas, pages 223--231.Google Scholar
- Andreas Stolcke. 2002. SRILM---an extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing, pages 901--904.Google Scholar
- Christoph Tillmann. 2004. A unigram orientation model for statistical machine translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference (HLT/NAACL-04), pages 101--104. Google ScholarDigital Library
- Wen Wang and Mary P. Harper. 2002. The Super-ARV language model: investigating the effectiveness of tightly integrating multiple knowledge sources. In Proceedings of Empirical Methods in Natural Language Processing, pages 238--247. Google ScholarDigital Library
- Wen Wang, Andreas Stolcke, and Jing Zheng. 2007. Reranking machine translation hypotheses with structured and web-based language models. In IEEE Workshop on Automatic Speech Recognition & Understanding, pages 159--164.Google ScholarCross Ref
- Jing Zheng, Necip Fazil Ayan, Wen Wang, Dimitra Vergyri, Nicolas Scheffer, and Andreas Stolcke. 2008. SRI systems in the NIST MT08 Evaluation. In Proceedings of the NIST 2008 Open MT Evaluation Workshop.Google Scholar
- Statistical machine translation with local language models
Recommendations
Factored bilingual n-gram language models for statistical machine translation
In this work, we present an extension of n-gram-based translation models based on factored language models (FLMs). Translation units employed in the n-gram-based approach to statistical machine translation (SMT) are based on mappings of sequences of raw ...
Preordering using a Target-Language Parser via Cross-Language Syntactic Projection for Statistical Machine Translation
When translating between languages with widely different word orders, word reordering can present a major challenge. Although some word reordering methods do not employ source-language syntactic structures, such structures are inherently useful for word ...
Comments