Abstract
This article innovatively addresses machine translation from Chinese to Catalan using neural pivot strategies trained without any direct parallel data. The Catalan language is very similar to Spanish from a linguistic point of view, which motivates the use of Spanish as pivot language. Regarding neural architecture, we are using the latest state-of-the-art, which is the Transformer model, only based on attention mechanisms. Additionally, this work provides new resources to the community, which consists of a human-developed gold standard of 4,000 sentences between Catalan and Chinese and all the others United Nations official languages (Arabic, English, French, Russian, and Spanish). Results show that the standard pseudo-corpus or synthetic pivot approach performs better than cascade.
- Maite Ardevol. 2006. Informe Anual OME 2006: Tendencies de futur i noves realitats. http://coneixement.accio.gencat.cat/c/document_library/get_file?uuid=a1d92ec4-ac8d-40e1-872e-f87d81a6bed7&groupId===30582.Google Scholar
- Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. 2017. Unsupervised machine translation using monolingual corpora. CoRR abs/1711.00041.Google Scholar
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Arxiv Preprint Arxiv:1409.0473.Google Scholar
- Luisa Bentivogli, Arianna Bisazza, Mauro Cettolo, and Marcello Federico. 2016. Neural versus phrase-based machine translation quality: A case study. Arxiv Preprint Arxiv:1608.04631.Google Scholar
- Ibana Casaburi. 2016. Chinese International Investment. http://itemsweb.esade.edu/research/esadegeo/ChineseInvestmentTrendsInEurope.pdf.Google Scholar
- Yong Cheng, Qian Yang, Yang Liu, Maosong Sun, and Wei Xu. 2017. Joint training for pivot-based neural machine translation. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). 3974--3980. Google ScholarDigital Library
- Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Arxiv Preprint Arxiv:1406.1078.Google Scholar
- Marta R. Costa-jussà. 2017. Why Catalan-Spanish neural machine translation? analysis, comparison and combination with standard rule and phrase-based technologies. In Proceedings of the 4th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial’17). Association for Computational Linguistics, Valencia, Spain, 55--62.Google Scholar
- Marta R. Costa-jussà, David Aldón, and José A. Fonollosa. 2017. Chinese-Spanish neural machine translation enhanced with character and word bitmap fonts. Mach. Trans. 31, 1--2 (June 2017), 35--47. Google ScholarDigital Library
- Marta R. Costa-jussà and Jordi Centelles. 2015. Description of the Chinese-to-Spanish rule-based machine translation system developed using a hybrid combination of human annotation and statistical techniques. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 15, 1, Article 1 (Nov. 2015). Google ScholarDigital Library
- Marta R. Costa-jussà and C. Escolano. 2016. Morphology generation for statistical machine translation using deep learning techniques. CORR, Arxiv:1610.02209.Google Scholar
- Marta R. Costa-jussà, José A. R. Fonollosa, José B. Mariño, Marc Poch, and Mireia Farrús. 2014. A large Spanish-Catalan parallel corpus release for machine translation. Comput. Info. 33, 4 (2014), 907--920.Google Scholar
- Marta R. Costa-jussà, Carlos A. Henríquez Q, and Rafael E. Banchs. 2012. Evaluating indirect strategies for Chinese-Spanish statistical machine translation. J. Artif. Int. Res. 45, 1 (Sept. 2012), 761--780. Google ScholarDigital Library
- Marta R. Costa-jussá, Noé Casas, and Maite Melero. 2018. English-Catalan neural machine translation in the biomedical domain through the cascade approach. In Proceedings of the 11th Language Resources and Evaluation Conference of the European Language Resources Association.Google Scholar
- John DeFrancis. 1984. The Chinese language: Fact and fantasy. http://www.la.utexas.edu/dsena/courses/globexchina/readings/defrancis.pdf.Google Scholar
- Mireia Farrús, Marta R. Costa-jussà, José B. Mariño, Marc Poch, Adolfo Hernández, Carlos Henríquez, and José A. Fonollosa. 2011. Overcoming statistical machine translation limitations: Error analysis and proposed solutions for the Catalan-Spanish language pair. Lang. Resour. Eval. 45, 2 (May 2011), 181--208. Google ScholarDigital Library
- Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. Convolutional sequence to sequence learning. Arxiv Preprint Arxiv:1705.03122.Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735--1780. Google ScholarDigital Library
- Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda B. Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s multilingual neural machine translation system: Enabling zero-shot translation. CoRR abs/1611.04558.Google Scholar
- Philipp Koehn and Rebecca Knowles. 2017. Six challenges for neural machine translation. In Proceedings of the First Workshop on Neural Machine Translation. Association for Computational Linguistics, 28--39.Google ScholarCross Ref
- Guillaume Lample, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2017. Unsupervised machine translation using monolingual corpora only. CoRR abs/1711.00043.Google Scholar
- Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. Arxiv Preprint Arxiv:1508.04025 (2015).Google Scholar
- Lluís Padró and Evgeny Stanilovsky. 2012. FreeLing 3.0: Towards wider multilinguality. In Proceedings of the International Conference on Language Resources and Evaluation.Google Scholar
- Gema Ramirez Sanchez, Felipe Sanchez-Martinez, Sergio Ortiz Rojas, Juan Antonio Perez-Ortiz, and Mikel L. Forcada. 2006. Opentrad Apertium open-source machine translation system: An opportunity for business and research. http://www.mt-archive.info/Aslib-2006-Ramirez-Sanchez.pdf.Google Scholar
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 1715--1725.Google ScholarCross Ref
- Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. MIT Press, 3104--3112. Google ScholarDigital Library
- Antonio Toral and Víctor M. Sánchez-Cartagena. 2017. A multifaceted evaluation of neural versus phrase-based machine translation for 9 language directions. Arxiv Preprint Arxiv:1701.02901 (2017).Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 6000--6010. Google ScholarDigital Library
- Michal Ziemski, Marcin Junczys-Dowmunt, and Bruno Pouliquen. 2016. The United Nations parallel corpus v1.0. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16), Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association (ELRA), Paris, France. 23--28.Google Scholar
Index Terms
- Chinese-Catalan: A Neural Machine Translation Approach Based on Pivoting and Attention Mechanisms
Recommendations
Morphologically Motivated Input Variations and Data Augmentation in Turkish-English Neural Machine Translation
Success of neural networks in natural language processing has paved the way for neural machine translation (NMT), which rapidly became the mainstream approach in machine translation. Significant improvement in translation performance has been achieved ...
Extremely low-resource neural machine translation for Asian languages
AbstractThis paper presents a set of effective approaches to handle extremely low-resource language pairs for self-attention based neural machine translation (NMT) focusing on English and four Asian languages. Starting from an initial set of parallel ...
Neural Machine Translation Enhancements through Lexical Semantic Network
ICCMS '18: Proceedings of the 10th International Conference on Computer Modeling and SimulationIn most languages, many words have multiple senses, thus machine translation systems have to choose between several candidates representing different senses of an input word. Although neural machine translation has recently become a dominant paradigm ...
Comments