skip to main content
note
Open Access

Chinese-Catalan: A Neural Machine Translation Approach Based on Pivoting and Attention Mechanisms

Published:22 April 2019Publication History
Skip Abstract Section

Abstract

This article innovatively addresses machine translation from Chinese to Catalan using neural pivot strategies trained without any direct parallel data. The Catalan language is very similar to Spanish from a linguistic point of view, which motivates the use of Spanish as pivot language. Regarding neural architecture, we are using the latest state-of-the-art, which is the Transformer model, only based on attention mechanisms. Additionally, this work provides new resources to the community, which consists of a human-developed gold standard of 4,000 sentences between Catalan and Chinese and all the others United Nations official languages (Arabic, English, French, Russian, and Spanish). Results show that the standard pseudo-corpus or synthetic pivot approach performs better than cascade.

References

  1. Maite Ardevol. 2006. Informe Anual OME 2006: Tendencies de futur i noves realitats. http://coneixement.accio.gencat.cat/c/document_library/get_file?uuid=a1d92ec4-ac8d-40e1-872e-f87d81a6bed7&groupId===30582.Google ScholarGoogle Scholar
  2. Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. 2017. Unsupervised machine translation using monolingual corpora. CoRR abs/1711.00041.Google ScholarGoogle Scholar
  3. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Arxiv Preprint Arxiv:1409.0473.Google ScholarGoogle Scholar
  4. Luisa Bentivogli, Arianna Bisazza, Mauro Cettolo, and Marcello Federico. 2016. Neural versus phrase-based machine translation quality: A case study. Arxiv Preprint Arxiv:1608.04631.Google ScholarGoogle Scholar
  5. Ibana Casaburi. 2016. Chinese International Investment. http://itemsweb.esade.edu/research/esadegeo/ChineseInvestmentTrendsInEurope.pdf.Google ScholarGoogle Scholar
  6. Yong Cheng, Qian Yang, Yang Liu, Maosong Sun, and Wei Xu. 2017. Joint training for pivot-based neural machine translation. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). 3974--3980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Arxiv Preprint Arxiv:1406.1078.Google ScholarGoogle Scholar
  8. Marta R. Costa-jussà. 2017. Why Catalan-Spanish neural machine translation? analysis, comparison and combination with standard rule and phrase-based technologies. In Proceedings of the 4th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial’17). Association for Computational Linguistics, Valencia, Spain, 55--62.Google ScholarGoogle Scholar
  9. Marta R. Costa-jussà, David Aldón, and José A. Fonollosa. 2017. Chinese-Spanish neural machine translation enhanced with character and word bitmap fonts. Mach. Trans. 31, 1--2 (June 2017), 35--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Marta R. Costa-jussà and Jordi Centelles. 2015. Description of the Chinese-to-Spanish rule-based machine translation system developed using a hybrid combination of human annotation and statistical techniques. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 15, 1, Article 1 (Nov. 2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Marta R. Costa-jussà and C. Escolano. 2016. Morphology generation for statistical machine translation using deep learning techniques. CORR, Arxiv:1610.02209.Google ScholarGoogle Scholar
  12. Marta R. Costa-jussà, José A. R. Fonollosa, José B. Mariño, Marc Poch, and Mireia Farrús. 2014. A large Spanish-Catalan parallel corpus release for machine translation. Comput. Info. 33, 4 (2014), 907--920.Google ScholarGoogle Scholar
  13. Marta R. Costa-jussà, Carlos A. Henríquez Q, and Rafael E. Banchs. 2012. Evaluating indirect strategies for Chinese-Spanish statistical machine translation. J. Artif. Int. Res. 45, 1 (Sept. 2012), 761--780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Marta R. Costa-jussá, Noé Casas, and Maite Melero. 2018. English-Catalan neural machine translation in the biomedical domain through the cascade approach. In Proceedings of the 11th Language Resources and Evaluation Conference of the European Language Resources Association.Google ScholarGoogle Scholar
  15. John DeFrancis. 1984. The Chinese language: Fact and fantasy. http://www.la.utexas.edu/dsena/courses/globexchina/readings/defrancis.pdf.Google ScholarGoogle Scholar
  16. Mireia Farrús, Marta R. Costa-jussà, José B. Mariño, Marc Poch, Adolfo Hernández, Carlos Henríquez, and José A. Fonollosa. 2011. Overcoming statistical machine translation limitations: Error analysis and proposed solutions for the Catalan-Spanish language pair. Lang. Resour. Eval. 45, 2 (May 2011), 181--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. Convolutional sequence to sequence learning. Arxiv Preprint Arxiv:1705.03122.Google ScholarGoogle Scholar
  18. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda B. Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s multilingual neural machine translation system: Enabling zero-shot translation. CoRR abs/1611.04558.Google ScholarGoogle Scholar
  20. Philipp Koehn and Rebecca Knowles. 2017. Six challenges for neural machine translation. In Proceedings of the First Workshop on Neural Machine Translation. Association for Computational Linguistics, 28--39.Google ScholarGoogle ScholarCross RefCross Ref
  21. Guillaume Lample, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2017. Unsupervised machine translation using monolingual corpora only. CoRR abs/1711.00043.Google ScholarGoogle Scholar
  22. Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. Arxiv Preprint Arxiv:1508.04025 (2015).Google ScholarGoogle Scholar
  23. Lluís Padró and Evgeny Stanilovsky. 2012. FreeLing 3.0: Towards wider multilinguality. In Proceedings of the International Conference on Language Resources and Evaluation.Google ScholarGoogle Scholar
  24. Gema Ramirez Sanchez, Felipe Sanchez-Martinez, Sergio Ortiz Rojas, Juan Antonio Perez-Ortiz, and Mikel L. Forcada. 2006. Opentrad Apertium open-source machine translation system: An opportunity for business and research. http://www.mt-archive.info/Aslib-2006-Ramirez-Sanchez.pdf.Google ScholarGoogle Scholar
  25. Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 1715--1725.Google ScholarGoogle ScholarCross RefCross Ref
  26. Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. MIT Press, 3104--3112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Antonio Toral and Víctor M. Sánchez-Cartagena. 2017. A multifaceted evaluation of neural versus phrase-based machine translation for 9 language directions. Arxiv Preprint Arxiv:1701.02901 (2017).Google ScholarGoogle Scholar
  28. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 6000--6010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Michal Ziemski, Marcin Junczys-Dowmunt, and Bruno Pouliquen. 2016. The United Nations parallel corpus v1.0. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16), Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association (ELRA), Paris, France. 23--28.Google ScholarGoogle Scholar

Index Terms

  1. Chinese-Catalan: A Neural Machine Translation Approach Based on Pivoting and Attention Mechanisms

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 18, Issue 4
      December 2019
      305 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3327969
      Issue’s Table of Contents

      Copyright © 2019 Owner/Author

      This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 April 2019
      • Accepted: 1 February 2019
      • Revised: 1 December 2018
      • Received: 1 June 2018
      Published in tallip Volume 18, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • note
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format