ABSTRACT
We describe a new pruning approach to remove phrase pairs from translation models of statistical machine translation systems. The approach applies the original translation system to a large amount of text and calculates usage statistics for the phrase pairs. Using these statistics the relevance of each phrase pair can be estimated. The approach is tested against a strong baseline based on previous work and shows significant improvements.
- Yasuhiro Akiba, Marcello Federico, Noriko Kando, Hiromi Nakaiwa, Michael Paul, and Jun'ichi Tsujii}. 2004. Overview of the IWSLT04 Evaluation Campaign. Proceedings of IWSLT 2004, Kyoto, Japan.Google Scholar
- Chris Callison-Burch, Colin Bannard, and Josh Schroeder. 2005. Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases. Proceedings of ACL 2005, Ann Arbor, MI, USA. Google ScholarDigital Library
- Yann Le Cun, John S. Denker, and Sara A. Solla. 1990. Optimal brain damage. In Advances in Neural Information Processing Systems 2, pages 598--605. Morgan Kaufmann, 1990. Google ScholarDigital Library
- Matthias Eck, Ian Lane, Nguyen Bach, Sanjika Hewavitharana, Muntsin Kolss, Bing Zhao, Almut Silja Hildebrand, Stephan Vogel, and Alex Waibel. 2006. The UKA/CMU Statistical Machine Translation System for IWSLT 2006. Proceedings of IWSLT 2006, Kyoto, Japan.Google Scholar
- Ryosuke Isotani, Kyoshi Yamabana, Shinichi Ando, Ken Hanazawa, Shin-ya Ishikawa and Ken.ichi Iso. 2003. Speech-to-speech translation software on PDAs for travel conversation. NEC research&development, Tokyo, Japan.Google Scholar
- Philipp Koehn. 2004. A Beam Search Decoder for Statistical Machine Translation Models. Proceedings of AMTA 2004, Baltimore, MD, USA.Google ScholarCross Ref
- Franz Josef Och and Hermann Ney, 2000. Improved statistical alignment models, Proceedings of ACL 2000, Hongkong, China. Google ScholarDigital Library
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. Proceedings of ACL 2002, Philadelphia, PA, USA. Google ScholarDigital Library
- Toshiyuki Takezawa, Eiichiro Sumita, Fumiaki Sugaya, Hirofumi Yamamoto, and Seiichi Yamamoto. 2002. Toward a Broad-coverage Bilingual Corpus for Speech Translation of Travel Conversation in the Real World. Proceedings of LREC 2002, Las Palmas, Spain.Google Scholar
- Stephan Vogel. 2005. PESA: Phrase Pair Extraction as Sentence Splitting. Proceedings of MTSummit X, Phuket, Thailand.Google Scholar
- Ying Zhang and Stephan Vogel. 2005. An Efficient Phrase-to-Phrase Alignment Model for Arbitrarily Long Phrases and Large Corpora. Proceedings of EAMT 2005, Budapest, Hungary.Google Scholar
- Translation model pruning via usage statistics for statistical machine translation
Recommendations
N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination
EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational LinguisticsIn this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven ...
Linguistically annotated BTG for statistical machine translation
COLING '08: Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1Bracketing Transduction Grammar (BTG) is a natural choice for effective integration of desired linguistic knowledge into statistical machine translation (SMT). In this paper, we propose a Linguistically Annotated BTG (LABTG) for SMT. It conveys ...
Syntactic discriminative language model rerankers for statistical machine translation
This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language ...
Comments