research-article

Free Access

Translation model pruning via usage statistics for statistical machine translation

Authors:
Matthias Eck

Carnegie Mellon University, Pittsburgh

Carnegie Mellon University, Pittsburgh
View Profile

,
Stephan Vogel

Carnegie Mellon University, Pittsburgh

Carnegie Mellon University, Pittsburgh
View Profile

,
Alex Waibel

Carnegie Mellon University, Pittsburgh

Carnegie Mellon University, Pittsburgh
View Profile

NAACL-Short '07: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short PapersApril 2007Pages 21–24

Published:22 April 2007Publication History

NAACL-Short '07: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

Pages 21–24

ABSTRACT

We describe a new pruning approach to remove phrase pairs from translation models of statistical machine translation systems. The approach applies the original translation system to a large amount of text and calculates usage statistics for the phrase pairs. Using these statistics the relevance of each phrase pair can be estimated. The approach is tested against a strong baseline based on previous work and shows significant improvements.

References

Yasuhiro Akiba, Marcello Federico, Noriko Kando, Hiromi Nakaiwa, Michael Paul, and Jun'ichi Tsujii}. 2004. Overview of the IWSLT04 Evaluation Campaign. Proceedings of IWSLT 2004, Kyoto, Japan.Google Scholar
Chris Callison-Burch, Colin Bannard, and Josh Schroeder. 2005. Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases. Proceedings of ACL 2005, Ann Arbor, MI, USA. Google ScholarDigital Library
Yann Le Cun, John S. Denker, and Sara A. Solla. 1990. Optimal brain damage. In Advances in Neural Information Processing Systems 2, pages 598--605. Morgan Kaufmann, 1990. Google ScholarDigital Library
Matthias Eck, Ian Lane, Nguyen Bach, Sanjika Hewavitharana, Muntsin Kolss, Bing Zhao, Almut Silja Hildebrand, Stephan Vogel, and Alex Waibel. 2006. The UKA/CMU Statistical Machine Translation System for IWSLT 2006. Proceedings of IWSLT 2006, Kyoto, Japan.Google Scholar
Ryosuke Isotani, Kyoshi Yamabana, Shinichi Ando, Ken Hanazawa, Shin-ya Ishikawa and Ken.ichi Iso. 2003. Speech-to-speech translation software on PDAs for travel conversation. NEC research&development, Tokyo, Japan.Google Scholar
Philipp Koehn. 2004. A Beam Search Decoder for Statistical Machine Translation Models. Proceedings of AMTA 2004, Baltimore, MD, USA.Google ScholarCross Ref
Franz Josef Och and Hermann Ney, 2000. Improved statistical alignment models, Proceedings of ACL 2000, Hongkong, China. Google ScholarDigital Library
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. Proceedings of ACL 2002, Philadelphia, PA, USA. Google ScholarDigital Library
Toshiyuki Takezawa, Eiichiro Sumita, Fumiaki Sugaya, Hirofumi Yamamoto, and Seiichi Yamamoto. 2002. Toward a Broad-coverage Bilingual Corpus for Speech Translation of Travel Conversation in the Real World. Proceedings of LREC 2002, Las Palmas, Spain.Google Scholar
Stephan Vogel. 2005. PESA: Phrase Pair Extraction as Sentence Splitting. Proceedings of MTSummit X, Phuket, Thailand.Google Scholar
Ying Zhang and Stephan Vogel. 2005. An Efficient Phrase-to-Phrase Alignment Model for Arbitrarily Long Phrases and Large Corpora. Proceedings of EAMT 2005, Budapest, Hungary.Google Scholar

Translation model pruning via usage statistics for statistical machine translation
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination
EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics

In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven ...
Read More
Linguistically annotated BTG for statistical machine translation
COLING '08: Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1

Bracketing Transduction Grammar (BTG) is a natural choice for effective integration of desired linguistic knowledge into statistical machine translation (SMT). In this paper, we propose a Linguistically Annotated BTG (LABTG) for SMT. It conveys ...
Read More
Syntactic discriminative language model rerankers for statistical machine translation

This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
NAACL-Short '07: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
April 2007
228 pages
General Chair:
Candace Sidner,
Program Chairs:
Tanja Schultz,
Matthew Stone,
ChengXiang Zhai
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 22 April 2007
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate21of29submissions,72%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 505
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Translation model pruning via usage statistics for statistical machine translation

NAACL-Short '07: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

ABSTRACT

References

Cited By

Recommendations

N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination

Linguistically annotated BTG for statistical machine translation

Syntactic discriminative language model rerankers for statistical machine translation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Translation model pruning via usage statistics for statistical machine translation

NAACL-Short '07: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

ABSTRACT

References

Cited By

Recommendations

N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination

Linguistically annotated BTG for statistical machine translation

Syntactic discriminative language model rerankers for statistical machine translation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media