Abstract
Word reordering is a difficult task for translation between languages with widely different word orders, such as Japanese and English. A previously proposed post-ordering method for Japanese-to-English translation first translates a Japanese sentence into a sequence of English words in a word order similar to that of Japanese, then reorders the sequence into an English word order. We employed this post-ordering framework and improved upon its reordering method. The existing post-ordering method reorders the sequence of English words via SMT, whereas our method reorders the sequence by (1) parsing the sequence using ITG to obtain syntactic structures which are similar to Japanese syntactic structures, and (2) transferring the obtained syntactic structures into English syntactic structures according to the ITG. The experiments using Japanese-to-English patent translation demonstrated the effectiveness of our method and showed that both the RIBES and BLEU scores were improved over compared methods.
- Takako Aikawa and Achim Ruopp. 2009. Chained system: A linear combination of different types of statistical machine translation systems. In Proceedings of the 12th Machine Translation Summit. International Association for Machine Translation.Google Scholar
- Ibrahim Badr, Rabih Zbib, and James Glass. 2009. Syntactic phrase reordering for English-to-Arabic statistical machine translation. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009). Association for Computational Linguistics, 86--93. http://www.aclweb.org/anthology/E09-1011. Google ScholarDigital Library
- Phil Blunsom, Trevor Cohn, Chris Dyer, and Miles Osborne. 2009. A Gibbs sampler for phrasal synchronous grammar induction. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, 82--790. http://www.aclweb.org/anthology/P/P09/P09-1088. Google ScholarDigital Library
- Han-Bin Chen, Jian-Cheng Wu, and Jason S. Chang. 2009. Learning bilingual linguistic reordering model for statistical machine translation. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 254--262. http://www.aclweb.org/anthology/N/N09/N09-1029. Google ScholarDigital Library
- Stanley F. Chen and Joshua T. Goodman. 1998. An Empirical Study of Smoothing Techniques for Language Modeling. Tech. rep. TR-10-98, Computer Science Group, Harvard University, Cambridge, MA.Google Scholar
- Colin Cherry and Dekang Lin. 2007. Inversion transduction grammar for joint phrasal translation modeling. In Proceedings of the AMTA Workshop on Syntax and Structure in Statistical Translation. Association for Computational Linguistics, 17--24. http://www.aclweb.org/anthology/W/W07/W07-0403. Google ScholarDigital Library
- David Chiang. 2007. Hierarchical phrase-based translation. Computat. Linguist. 33, 2, 201--228. Google ScholarDigital Library
- David Chiang. 2010. Learning to translate with source and target syntax. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1443--1452. http://www.aclweb.org/anthology/P10-1146. Google ScholarDigital Library
- Michael Collins, Philipp Koehn, and Ivona Kucerova. 2005. Clause restructuring for statistical machine translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05). Association for Computational Linguistics, 531--540. http://dx.doi.org/10.3115/1219840.1219906. Google ScholarDigital Library
- John DeNero and Jakob Uszkoreit. 2011. Inducing sentence structure from parallel corpora for reordering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 193--203. http://www.aclweb.org/anthology/D11-1018. Google ScholarDigital Library
- Yuan Ding and Martha Palmer. 2005. Machine translation using probabilistic synchronous dependency insertion grammars. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05). Association for Computational Linguistics, 541--548. http://dx.doi.org/10.3115/1219840.1219907. Google ScholarDigital Library
- Loïc Dugast, Jean Senellart, and Philipp Koehn. 2007. Statistical post-editing on SYSTRAN’s rule-based translation system. In Proceedings of the 2nd Workshop on Statistical Machine Translation. Association for Computational Linguistics, 220--223. http://www.aclweb.org/anthology/W/W07/W07-0732. Google ScholarDigital Library
- Chris Dyer and Philip Resnik. 2010. Context-free reordering, finite-state translation. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 858--866. http://www.aclweb.org/anthology/N10-1128. Google ScholarDigital Library
- Terumasa Ehara. 2007. Rule based machine translation combined with statistical post editor for Japanese to English patent translation. In Proceedings of the MT Summit XI Workshop on Patent Translation. International Association for Machine Translation, 13--18.Google Scholar
- Atsushi Fujii, Masao Utiyama, Mikio Yamamoto, Takehito Utsuro, Terumasa Ehara, Hiroshi Echizen-ya, and Sayori Shimohata. 2010. Overview of the patent translation task at the NTCIR-8 workshop. In Proceedings of the 8th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Informational Retrieval, Question Answering and Cross-Lingual Information Access (NTCIR-8). 371--376.Google Scholar
- Michel Galley, Mark Hopkins, Kevin Knight, and Daniel Marcu. 2004. What’s in a translation rule? In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Daniel Marcu, Susan Dumais, and Salim Roukos Eds., Association for Computational Linguistics, 273--280.Google Scholar
- Niyu Ge. 2010. A direct syntax-driven reordering model for phrase-based machine translation. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Association for Computational Linguistics, 849--857. http://www.aclweb.org/anthology/N10-1127. Google ScholarDigital Library
- Dmitriy Genzel. 2010. Automatically learning source-side reordering rules for large scale machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). 376--384. http://www.aclweb.org/anthology/C10-1043. Google ScholarDigital Library
- Isao Goto, Bin Lu, Ka Po Chow, Eiichiro Sumita, and Benjamin K. Tsou. 2011. Overview of the patent machine translation task at the NTCIR-9 workshop. In Proceedings of the 9th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Informational Retrieval, Question Answering and Cross-Lingual Information Access (NTCIR-9). 559--578.Google Scholar
- Isao Goto, Masao Utiyama, and Eiichiro Sumita. 2012. Post-ordering by parsing for Japanese-English statistical machine translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 311--316. http://www.aclweb.org/anthology/P12-2061. Google ScholarDigital Library
- Nizar Habash. 2007. Syntactic preprocessing for statistical machine translation. In Proceedings of the Machine Translation Summit XI. 215--222.Google Scholar
- Yanqing He, Yu Zhou, Chengqing Zong, and Huilin Wang. 2010. A novel reordering model based on multi-layer phrase for statistical machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). 447--455. http://www.aclweb.org/anthology/C10-1051. Google ScholarDigital Library
- Hieu Hoang, Philipp Koehn, and Adam Lopez. 2009. A unified framework for phrase-based, hierarchical, and syntax-based statistical machine translation. In Proceedings of the International Workshop on Spoken Language Translation. 152--159.Google Scholar
- Liang Huang, Kevin Knight, and Aravind Joshi. 2006. Statistical syntax-directed translation with extended domain of locality. In Proceedings of the 7th Conference of the Association for Machine Translation of the Americas. 66--73.Google Scholar
- Hideki Isozaki, Tsutomu Hirao, Kevin Duh, Katsuhito Sudoh, and Hajime Tsukada. 2010a. Automatic evaluation of translation quality for distant language pairs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 944--952. Google ScholarDigital Library
- Hideki Isozaki, Katsuhito Sudoh, Hajime Tsukada, and Kevin Duh. 2010b. Head finalization: A simple reordering rule for SOV languages. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMATR. Association for Computational Linguistics, 244--251. http://www.aclweb.org/anthology/W10-1736. Google ScholarDigital Library
- Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04). Dekang Lin and Dekai Wu Eds., Association for Computational Linguistics, 388--395.Google Scholar
- Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. 48--54. Google ScholarDigital Library
- Philipp Koehn, Amittai Axelrod, Alexandra Birch Mayne, Chris Callison-Burch, Miles Osborne, and David Talbot. 2005. Edinburgh system description for the 2005 IWSLT speech translation evaluation. In Proceedings of the International Workshop on Spoken Language Translation.Google Scholar
- Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. Association for Computational Linguistics, 177--180. http://www.aclweb.org/anthology/P07-2045. Google ScholarDigital Library
- Shuhei Kondo, Mamoru Komachi, Yuji Matsumoto, Katsuhito Sudoh, Kevin Duh, and Hajime Tsukada. 2011. Learning of linear ordering problems and its application to J-E patent translation in NTCIR-9 PatentMT. In Proceedings of the 9th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Informational Retrieval, Question Answering and Cross-Lingual Information Access (NTCIR-9). 641--645.Google Scholar
- Chi-Ho Li, Minghui Li, Dongdong Zhang, Mu Li, Ming Zhou, and Yi Guan. 2007. A probabilistic approach to syntax-based reordering for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Association for Computational Linguistics, 720--727. http://www.aclweb.org/anthology/P07-1091.Google Scholar
- Yang Liu, Qun Liu, and Shouxun Lin. 2006. Tree-to-string alignment template for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 609--616. http://dx.doi.org/10.3115/1220175.1220252. Google ScholarDigital Library
- Yang Liu, Yajuan Lü, and Qun Liu. 2009. Improving tree-to-tree translation with packed forests. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, 558--566. http://www.aclweb.org/anthology/P/P09/P09-1063. Google ScholarDigital Library
- E. Matusov, S. Kanthak, and Hermann Ney. 2005. On the integration of speech recognition and statistical machine translation. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH). 3177--3180.Google Scholar
- Yusuke Miyao and Jun’ichi Tsujii. 2008. Feature forest models for probabilistic HPSG parsing. Computat. Linguist. 34, 1, 81--88. Google ScholarDigital Library
- Graham Neubig, Taro Watanabe, Eiichiro Sumita, Shinsuke Mori, and Tatsuya Kawahara. 2011. An unsupervised model for joint phrase alignment and extraction. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 632--641. http://www.aclweb.org/anthology/P11-1064. Google ScholarDigital Library
- Graham Neubig, Taro Watanabe, and Shinsuke Mori. 2012. Inducing a discriminative parser to optimize machine translation reordering. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 843--853. http://www.aclweb.org/anthology/D12-1077. Google ScholarDigital Library
- Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 160--167. http://dx.doi.org/10.3115/1075096.1075117. Google ScholarDigital Library
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics. 311--318. Google ScholarDigital Library
- Slav Petrov. 2010. Products of random latent variable grammars. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Association for Computational Linguistics, 19--27. http://www.aclweb.org/anthology/N10-1003. Google ScholarDigital Library
- Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. 2006. Learning accurate, compact, and interpretable tree annotation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 433--440. http://dx.doi.org/10.3115/1220175.1220230. Google ScholarDigital Library
- Slav Petrov, Aria Haghighi, and Dan Klein. 2008. Coarse-to-fine syntactic machine translation using language projections. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 108--116. http://www.aclweb.org/anthology/D08-1012. Google ScholarDigital Library
- Chris Quirk, Arul Menezes, and Colin Cherry. 2005. Dependency treelet translation: Syntactically informed phrasal SMT. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05). Association for Computational Linguistics, 271--279. http://dx.doi.org/10.3115/1219840.1219874. Google ScholarDigital Library
- Ananthakrishnan Ramanathan, Hegde, Jayprasad, Ritesh M. Shah, Pushpak Bhattacharyya, and Sasikumar M. 2008. Simple syntactic and morphological processing can help English-Hindi statistical machine translation. In Proceedings of the 3rd International Joint Conference on Natural Language Processing. 171--180.Google Scholar
- Kay Rottmann and Stephan Vogel. 2007. Word reordering in statistical machine translation with a POS-based distortion model. In Proceedings of the 11th Theoretical and Methodological Issues in Machine Translation (TMI). 171--180.Google Scholar
- Libin Shen, Jinxi Xu, and Ralph Weischedel. 2008. A new string-to-dependency machine translation algorithm with a target dependency language model. In Proceedings of the Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-08: HLT). Association for Computational Linguistics, 577--585. http://www.aclweb.org/anthology/P/P08/P08-1066.Google Scholar
- Michel Simard, Cyril Goutte, and Pierre Isabelle. 2007. Statistical phrase-based post-editing. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Association for Computational Linguistics, 508--515. http://www.aclweb.org/anthology/N/N07/N07-1064.Google Scholar
- Andreas Stolcke, Jing Zheng, Wen Wang, and Victor Abrash. 2011. SRILM at sixteen: Update and outlook. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop.Google Scholar
- Katsuhito Sudoh, Kevin Duh, Hajime Tsukada, Masaaki Nagata, Xianchao Wu, Takuya Matsuzaki, and Jun’ichi Tsujii. 2011a. NTT-UT statistical machine translation in NTCIR-9 PatentMT. In Proceedings of the 9th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Informational Retrieval, Question Answering and Cross-Lingual Information Access (NTCIR-9). 585--592.Google Scholar
- Katsuhito Sudoh, Xianchao Wu, Kevin Duh, Hajime Tsukada, and Masaaki Nagata. 2011b. Post-ordering in statistical machine translation. In Proceedings of the 13th Machine Translation Summit. 316--323.Google Scholar
- Roy Tromble and Jason Eisner. 2009. Learning linear ordering problems for better translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1007--1016. http://www.aclweb.org/anthology/D/D09/D09-1105. Google ScholarDigital Library
- Karthik Visweswariah, Jiri Navratil, Jeffrey Sorensen, Vijil Chenthamarakshan, and Nandakishore Kambhatla. 2010. Syntax based reordering with automatically derived rules for improved statistical machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). 1119--1127. http://www.aclweb.org/anthology/C10-1126. Google ScholarDigital Library
- Karthik Visweswariah, Rajakrishnan Rajkumar, Ankur Gandhe, Ananthakrishnan Ramanathan, and Jiri Navratil. 2011. A word reordering model for improved machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 486--496. http://www.aclweb.org/anthology/D11-1045. Google ScholarDigital Library
- Chao Wang, Michael Collins, and Philipp Koehn. 2007. Chinese syntactic reordering for statistical machine translation. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Association for Computational Linguistics, 737--745. http://www.aclweb.org/anthology/D/D07/D07-1077.Google Scholar
- Dekai Wu. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computat. Linguist. 23, 3, 377--403. Google ScholarDigital Library
- Xianchao Wu, Katsuhito Sudoh, Kevin Duh, Hajime Tsukada, and Masaaki Nagata. 2011a. Extracting pre-ordering rules from chunk-based dependency trees for Japanese-to-English translation. In Proceedings of the 13th Machine Translation Summit. 300--307.Google Scholar
- Xianchao Wu, Katsuhito Sudoh, Kevin Duh, Hajime Tsukada, and Masaaki Nagata. 2011b. Extracting pre-ordering rules from predicate-argument structures. In Proceedings of the 5th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, 29--37. http://www.aclweb.org/anthology/I11-1004.Google Scholar
- Fei Xia and Michael McCord. 2004. Improving a statistical MT system with automatically learned rewrite patterns. In Proceedings of the International Conference on Computational Linguistics (COLING). 508--514. Google ScholarDigital Library
- Peng Xu, Jaeho Kang, Michael Ringgaard, and Franz Och. 2009. Using a dependency parser to improve SMT for subject-object-verb languages. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Association for Computational Linguistics, 245--253. http://www.aclweb.org/anthology/N/N09/N09-1028. Google ScholarDigital Library
- Kenji Yamada and Kevin Knight. 2002. A decoder for syntax-based statistical MT. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 303--310. http://dx.doi.org/10.3115/1073083.1073134. Google ScholarDigital Library
- Richard Zens, Hermann Ney, Taro Watanabe, and Eiichiro Sumita. 2004. Reordering constraints for phrase-based statistical machine translation. In Proceedings of the International Conference on Computational Linguistics (COLING). 205--211. Google ScholarDigital Library
- Hao Zhang and Daniel Gildea. 2008. Efficient multi-pass decoding for synchronous context free grammars. In Proceedings of the Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-08: HLT). Association for Computational Linguistics, 209--217. http://www.aclweb.org/anthology/P/P08/P08-1025.Google Scholar
- Hao Zhang, Chris Quirk, Robert C. Moore, and Daniel Gildea. 2008. Bayesian learning of non-compositional phrases with synchronous parsing. In Proceedings of Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-08: HLT). Association for Computational Linguistics, 97--105. http://www.aclweb.org/anthology/P/P08/P08-1012.Google Scholar
Index Terms
- Post-Ordering by Parsing with ITG for Japanese-English Statistical Machine Translation
Recommendations
Syntax-Based Post-Ordering for Efficient Japanese-to-English Translation
This article proposes a novel reordering method for efficient two-step Japanese-to-English statistical machine translation (SMT) that isolates reordering from SMT and solves it after lexical translation. This reordering problem, called post-ordering, is ...
Post-ordering by parsing for Japanese-English statistical machine translation
ACL '12: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2Reordering is a difficult task in translating between widely different languages such as Japanese and English. We employ the post-ordering framework proposed by (Sudoh et al., 2011b) for Japanese to English translation and improve upon the reordering ...
Preordering using a Target-Language Parser via Cross-Language Syntactic Projection for Statistical Machine Translation
When translating between languages with widely different word orders, word reordering can present a major challenge. Although some word reordering methods do not employ source-language syntactic structures, such structures are inherently useful for word ...
Comments