note

Open Access

Chinese-Catalan: A Neural Machine Translation Approach Based on Pivoting and Attention Mechanisms

Authors:
Marta R. Costa-Jussà

Universitat Politècnica de Catalunya, Barcelona, Spain

Universitat Politècnica de Catalunya, Barcelona, Spain
View Profile

,
Noé Casas

Universitat Politècnica de Catalunya, Barcelona, Spain

Universitat Politècnica de Catalunya, Barcelona, Spain
View Profile

,
Carlos Escolano

Universitat Politècnica de Catalunya, Barcelona, Spain

Universitat Politècnica de Catalunya, Barcelona, Spain
View Profile

,
José A. R. Fonollosa

Universitat Politècnica de Catalunya, Barcelona, Spain

Universitat Politècnica de Catalunya, Barcelona, Spain
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 18 Issue 4Article No.: 43pp 1–8https://doi.org/10.1145/3312575

Published:22 April 2019Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

This article innovatively addresses machine translation from Chinese to Catalan using neural pivot strategies trained without any direct parallel data. The Catalan language is very similar to Spanish from a linguistic point of view, which motivates the use of Spanish as pivot language. Regarding neural architecture, we are using the latest state-of-the-art, which is the Transformer model, only based on attention mechanisms. Additionally, this work provides new resources to the community, which consists of a human-developed gold standard of 4,000 sentences between Catalan and Chinese and all the others United Nations official languages (Arabic, English, French, Russian, and Spanish). Results show that the standard pseudo-corpus or synthetic pivot approach performs better than cascade.

References

Maite Ardevol. 2006. Informe Anual OME 2006: Tendencies de futur i noves realitats. http://coneixement.accio.gencat.cat/c/document_library/get_file?uuid=a1d92ec4-ac8d-40e1-872e-f87d81a6bed7&groupId===30582.Google Scholar
Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. 2017. Unsupervised machine translation using monolingual corpora. CoRR abs/1711.00041.Google Scholar
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Arxiv Preprint Arxiv:1409.0473.Google Scholar
Luisa Bentivogli, Arianna Bisazza, Mauro Cettolo, and Marcello Federico. 2016. Neural versus phrase-based machine translation quality: A case study. Arxiv Preprint Arxiv:1608.04631.Google Scholar
Ibana Casaburi. 2016. Chinese International Investment. http://itemsweb.esade.edu/research/esadegeo/ChineseInvestmentTrendsInEurope.pdf.Google Scholar
Yong Cheng, Qian Yang, Yang Liu, Maosong Sun, and Wei Xu. 2017. Joint training for pivot-based neural machine translation. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). 3974--3980. Google ScholarDigital Library
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Arxiv Preprint Arxiv:1406.1078.Google Scholar
Marta R. Costa-jussà. 2017. Why Catalan-Spanish neural machine translation? analysis, comparison and combination with standard rule and phrase-based technologies. In Proceedings of the 4th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial’17). Association for Computational Linguistics, Valencia, Spain, 55--62.Google Scholar
Marta R. Costa-jussà, David Aldón, and José A. Fonollosa. 2017. Chinese-Spanish neural machine translation enhanced with character and word bitmap fonts. Mach. Trans. 31, 1--2 (June 2017), 35--47. Google ScholarDigital Library
Marta R. Costa-jussà and Jordi Centelles. 2015. Description of the Chinese-to-Spanish rule-based machine translation system developed using a hybrid combination of human annotation and statistical techniques. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 15, 1, Article 1 (Nov. 2015). Google ScholarDigital Library
Marta R. Costa-jussà and C. Escolano. 2016. Morphology generation for statistical machine translation using deep learning techniques. CORR, Arxiv:1610.02209.Google Scholar
Marta R. Costa-jussà, José A. R. Fonollosa, José B. Mariño, Marc Poch, and Mireia Farrús. 2014. A large Spanish-Catalan parallel corpus release for machine translation. Comput. Info. 33, 4 (2014), 907--920.Google Scholar
Marta R. Costa-jussà, Carlos A. Henríquez Q, and Rafael E. Banchs. 2012. Evaluating indirect strategies for Chinese-Spanish statistical machine translation. J. Artif. Int. Res. 45, 1 (Sept. 2012), 761--780. Google ScholarDigital Library
Marta R. Costa-jussá, Noé Casas, and Maite Melero. 2018. English-Catalan neural machine translation in the biomedical domain through the cascade approach. In Proceedings of the 11th Language Resources and Evaluation Conference of the European Language Resources Association.Google Scholar
John DeFrancis. 1984. The Chinese language: Fact and fantasy. http://www.la.utexas.edu/dsena/courses/globexchina/readings/defrancis.pdf.Google Scholar
Mireia Farrús, Marta R. Costa-jussà, José B. Mariño, Marc Poch, Adolfo Hernández, Carlos Henríquez, and José A. Fonollosa. 2011. Overcoming statistical machine translation limitations: Error analysis and proposed solutions for the Catalan-Spanish language pair. Lang. Resour. Eval. 45, 2 (May 2011), 181--208. Google ScholarDigital Library
Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. Convolutional sequence to sequence learning. Arxiv Preprint Arxiv:1705.03122.Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735--1780. Google ScholarDigital Library
Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda B. Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s multilingual neural machine translation system: Enabling zero-shot translation. CoRR abs/1611.04558.Google Scholar
Philipp Koehn and Rebecca Knowles. 2017. Six challenges for neural machine translation. In Proceedings of the First Workshop on Neural Machine Translation. Association for Computational Linguistics, 28--39.Google ScholarCross Ref
Guillaume Lample, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2017. Unsupervised machine translation using monolingual corpora only. CoRR abs/1711.00043.Google Scholar
Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. Arxiv Preprint Arxiv:1508.04025 (2015).Google Scholar
Lluís Padró and Evgeny Stanilovsky. 2012. FreeLing 3.0: Towards wider multilinguality. In Proceedings of the International Conference on Language Resources and Evaluation.Google Scholar
Gema Ramirez Sanchez, Felipe Sanchez-Martinez, Sergio Ortiz Rojas, Juan Antonio Perez-Ortiz, and Mikel L. Forcada. 2006. Opentrad Apertium open-source machine translation system: An opportunity for business and research. http://www.mt-archive.info/Aslib-2006-Ramirez-Sanchez.pdf.Google Scholar
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 1715--1725.Google ScholarCross Ref
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. MIT Press, 3104--3112. Google ScholarDigital Library
Antonio Toral and Víctor M. Sánchez-Cartagena. 2017. A multifaceted evaluation of neural versus phrase-based machine translation for 9 language directions. Arxiv Preprint Arxiv:1701.02901 (2017).Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 6000--6010. Google ScholarDigital Library
Michal Ziemski, Marcin Junczys-Dowmunt, and Bruno Pouliquen. 2016. The United Nations parallel corpus v1.0. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16), Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association (ELRA), Paris, France. 23--28.Google Scholar

Index Terms

Chinese-Catalan: A Neural Machine Translation Approach Based on Pivoting and Attention Mechanisms
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation

Recommendations

Morphologically Motivated Input Variations and Data Augmentation in Turkish-English Neural Machine Translation
Success of neural networks in natural language processing has paved the way for neural machine translation (NMT), which rapidly became the mainstream approach in machine translation. Significant improvement in translation performance has been achieved ...
Read More
Extremely low-resource neural machine translation for Asian languages
Abstract
This paper presents a set of effective approaches to handle extremely low-resource language pairs for self-attention based neural machine translation (NMT) focusing on English and four Asian languages. Starting from an initial set of parallel ...
Read More
Neural Machine Translation Enhancements through Lexical Semantic Network
ICCMS '18: Proceedings of the 10th International Conference on Computer Modeling and Simulation

In most languages, many words have multiple senses, thus machine translation systems have to choose between several candidates representing different senses of an input word. Although neural machine translation has recently become a dominant paradigm ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 18, Issue 4
December 2019
305 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3327969
Editor:
Nianwen Xue
Brandeis University, Waltham, USA
Issue’s Table of Contents
Copyright © 2019 Owner/Author
This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 April 2019
- Accepted: 1 February 2019
- Revised: 1 December 2018
- Received: 1 June 2018
Published in tallip Volume 18, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Chinese-Catalan
Neural machine translation
pivot approaches
transformer
Qualifiers
- note
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 723
  Total Downloads
- Downloads (Last 12 months)95
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Chinese-Catalan: A Neural Machine Translation Approach Based on Pivoting and Attention Mechanisms

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Morphologically Motivated Input Variations and Data Augmentation in Turkish-English Neural Machine Translation

Extremely low-resource neural machine translation for Asian languages

Neural Machine Translation Enhancements through Lexical Semantic Network

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Chinese-Catalan: A Neural Machine Translation Approach Based on Pivoting and Attention Mechanisms

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Morphologically Motivated Input Variations and Data Augmentation in Turkish-English Neural Machine Translation

Extremely low-resource neural machine translation for Asian languages

Neural Machine Translation Enhancements through Lexical Semantic Network

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media