skip to main content
research-article

Time for More Languages: Temporal Tagging of Arabic, Italian, Spanish, and Vietnamese

Published:01 February 2014Publication History
Skip Abstract Section

Abstract

Most of the research on temporal tagging so far is done for processing English text documents. There are hardly any multilingual temporal taggers supporting more than two languages. Recently, the temporal tagger HeidelTime has been made publicly available, supporting the integration of new languages by developing language-dependent resources without modifying the source code.

In this article, we describe our work on developing such resources for two Asian and two Romance languages: Arabic, Vietnamese, Spanish, and Italian. While temporal tagging of the two Romance languages has been addressed before, there has been almost no research on Arabic and Vietnamese temporal tagging so far. Furthermore, we analyze language-dependent challenges for temporal tagging and explain the strategies we followed to address them. Our evaluation results on publicly available and newly annotated corpora demonstrate the high quality of our new resources for the four languages, which we make publicly available to the research community.

References

  1. Omar Alonso, Jannik Strötgen, Ricardo Baeza-Yates, and Michael Gertz. 2011. Temporal information retrieval: Challenges and opportunities. In Proceedings of the 1st International Temporal Web Analytics Workshop. 1--8.Google ScholarGoogle Scholar
  2. André Bittar, Pascal Amsili, Pascal Denis, and Laurence Danlos. 2011. French TimeBank: An ISO-TimeML annotated reference corpus. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (Short Papers - Vol. 2). 130--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Nicolas Boffo and Océane Ho Dinh. 2010. Automatic processing of temporality for VIET4NooJ. In Proceedings of the NooJ Conference. 39--41.Google ScholarGoogle Scholar
  4. Tommaso Caselli. 2010. It-TimeML: TimeML Annotation Scheme for Italian. Version 1.3.1. Tech. rep. Instituto di Linguistica Computazionale C.N.R.Google ScholarGoogle Scholar
  5. Tommaso Caselli, Felice dell’Orletta, and Irina Prodanof. 2009. TETI: A TimeML compliant TimEx tagger for Italian. In Proceedings of the International Multiconference on Computer Science and Information Technology. 185--192.Google ScholarGoogle ScholarCross RefCross Ref
  6. Tommaso Caselli, Valentina Bartalesi Lenzi, Rachele Sprugnoli, Emanuele Pianta, and Irina Prodanof. 2011. Annotating events, temporal expressions and relations in Italian: The It-TimeML experience for the Ita-TimeBank. In Proceedings of the 5th Linguistic Annotation Workshop. 143--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Angel X. Chang and Christopher D. Manning. 2012. SUTime: A library for recognizing and normalizing time expressions. In Proceedings of the 8th International Conference on Language Resources and Evaluation. 3735--3740.Google ScholarGoogle Scholar
  8. Ali Farghaly and Khaled Shaalan. 2009. Arabic natural language processing: Challenges and solutions. ACM Trans. Asian Lang. Inform. Process. 8, 4, Article 14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lisa Ferro, Laurie Gerber, Inderjeet Mani, Beth Sundheim, and George Wilson. 2005. TIDES 2005 Standard for the Annotation of Temporal Expressions. Tech. rep., MITRE Corporation.Google ScholarGoogle Scholar
  10. David Ferrucci and Adam Lally. 2004. UIMA: An architectural approach to unstructured information processing in the corporate research environment. Natural Lang. Eng. 10, 3--4, 327--348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Marta Guerrero Nieto and Roser Saurí. 2012. ModeS TimeBank 1.0. Tech. rep., Linguistic Data Consortium (LDC), Philadelphia, PA.Google ScholarGoogle Scholar
  12. Philippe Lambert, Sylviane R. Schwer, and Nicolas Boffo. 2012. A new model of time expressions detection and annotation in Vietnamese: The hôm case. In Proceedings of the International Conference on Asian Language Processing. 181--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Valentina Bartalesi Lenzi and Rachele Sprugnoli. 2007. Evalita 2007: Description and results of the TERN task. In Proceedings of the Evalita Workshop.Google ScholarGoogle Scholar
  14. Hector Llorens, Estela Saquete, and Borja Navarro. 2010. TIPSem (English and Spanish): Evaluating CRFs and semantic roles in TempEval-2. In Proceedings of the 5th International Workshop on Semantic Evaluation. 284--291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Bernardo Magnini, Emanuele Pianta, Christian Girardi, Matteo Negri, Lorenza Romano, Manuela Speranza, Valentina Bartalesi Lenzi, and Rachele Sprugnoli. 2006. I-CAB: The Italian Content Annotation Bank. In Proceedings of the 5th International Conference on Language Resources and Evaluation.Google ScholarGoogle Scholar
  16. Inderjeet Mani and George Wilson. 2000. Robust temporal processing of news. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. 69--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Pawel Mazur. 2012. Broad-Coverage Rule-Based Processing of Temporal Expressions. Ph.D. dissertation, Macquarie University and Wroclaw University of Technology.Google ScholarGoogle Scholar
  18. Pawel Mazur and Robert Dale. 2009. The DANTE temporal expression tagger. In Proceedings of the 3rd Language and Technology Conference. 245--257.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Pawel Mazur and Robert Dale. 2010. WikiWars: A new corpus for research on temporal expressions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 913--922. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Matteo Negri. 2007. Dealing with Italian temporal expressions: The ITA-CHRONOS system. In Proceedings of the Evalita Workshop.Google ScholarGoogle Scholar
  21. Matteo Negri and Luca Marseglia. 2004. Recognition and Normalization of Time Expressions: ITC-irst at TERN 2004. Tech. rep.Google ScholarGoogle Scholar
  22. Matteo Negri, Estela Saquete, Patricio Martínez-Barco, and Rafael Muñoz. 2006. Evaluating knowledge-based approaches to the multilingual extension of a temporal expression normalizer. In Proceedings of the Workshop on Annotating and Reasoning about Time and Events. 30--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Cam-Tu Nguyen, Xuan-Hieu Phan, and Thu-Trang Nguyen. 2010. JVnTextPro: a Tool to Process Vietnamese Texts. Tech. rep., Version 2.0, http://jvntextpro.sourceforge.net/.Google ScholarGoogle Scholar
  24. Dinh-Hoa Nguyen. 1997. Vietnamese. Vol. 9. John Benjamins Publishing Company.Google ScholarGoogle Scholar
  25. Marcel Puchol-Blasco, Estela Saquete, and Patricio Martínez-Barco. 2007. Multilingual extension of temporal expression recognition using parallel corpora. In Proceedings of the 14th International Symposium on Temporal Representation and Reasoning. 175--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. James Pustejovsky, Robert Knippen, Jessica Littman, and Roser Saurí. 2005. Temporal and event information in natural language text. Lang. Resources Eval. 39, 2--3, 123--164.Google ScholarGoogle Scholar
  27. Iman Saleh, Lamia Tounsi, and Josef van Genabith. 2011. ZamAn and Raqm: Extracting temporal and numerical expressions in Arabic. In Proceedings of the 7th Asia Information Retrieval Societies Conference. 562--573. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Estela Saquete, Rafael Muñoz, and Patricio Martínez-Barco. 2006. Event ordering using TERSEO system. Data Knowl. Eng. 58, 1, 70--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Estela Saquete and James Pustejovsky. 2011. Automatic transformation from TIDES to TimeML annotation. Lang. Resources Eval. 45, 4, 495--523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Roser Saurí and Toni Badia. 2012. Spanish TimeBank 1.0. Tech. rep., Linguistic Data Consortium (LDC), Philadelphia, PA.Google ScholarGoogle Scholar
  31. Roser Saurí, Estela Saquete, and James Pustejovsky. 2010. Annotating Time Expressions in Spanish. TimeML Annotation Guidelines. Tech. rep. BM 2010-02, Barcelona Media.Google ScholarGoogle Scholar
  32. Helmut Schmid. 1994. Probabilistic part-of-speech tagging using decision trees. In Proceedings of the International Conference on New Methods in Language Processing.Google ScholarGoogle Scholar
  33. Jannik Strötgen and Michael Gertz. 2010. HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation. 321--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jannik Strötgen and Michael Gertz. 2011. WikiWarsDE: A German corpus of narratives annotated with temporal expressions. In Proceedings of the Conference of the German Society for Computational Linguistics and Language Technology. 129--134.Google ScholarGoogle Scholar
  35. Jannik Strötgen and Michael Gertz. 2012. Temporal tagging on different domains: Challenges, strategies, and gold standards. In Proceedings of the 8th International Conference on Language Resources and Evaluation. 3746--3753.Google ScholarGoogle Scholar
  36. Jannik Strötgen and Michael Gertz. 2013. Multilingual and cross-domain temporal tagging. Lang. Resources Eval. 47, 2, 269--298.Google ScholarGoogle ScholarCross RefCross Ref
  37. Jannik Strötgen, Julian Zell, and Michael Gertz. 2013. HeidelTime: Tuning English and developing Spanish resources for TempEval-3. In Proceedings of the 7th International Workshop on Semantic Evaluation. 15--19.Google ScholarGoogle Scholar
  38. Pham Thi Xuan Thao, Tran Quoc Tri, Ai Kawazoe, Dien Dinh, and Nigel Collier. 2007. Construction of Vietnamese corpora for named entity recognition. In Proceedings of the Large Scale Semantic Access to Content (Text, Image, Video, and Sound). 719--724. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Laurence C. Thompson. 1991. A Vietnamese Reference Grammar. University of Hawaii Press.Google ScholarGoogle Scholar
  40. Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1. 173--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Tran Quoc Tri, Pham Thi Xuan Thao, Quoc-Hung Ngo, Dien Dinh, and Nigel Collier. 2007. Named entity recognition in Vietnamese documents. Progress Inform. 4, 5--13.Google ScholarGoogle Scholar
  42. Naushad UzZaman, Hector Llorens, James F. Allen, Leon Derczynski, Marc Verhagen, and James Pustejovsky. 2012. TempEval-3: Evaluating events, time expressions, and temporal relations. CoRR abs/1206.5333.Google ScholarGoogle Scholar
  43. Naushad UzZaman, Hector Llorens, Leon Derczynski, Marc Verhagen, James Allen, and James Pustejovsky. 2013. SemEval-2013 Task 1: TempEval-3: Evaluating time expressions, events, and temporal relations. In Proceedings of the 7th International Workshop on Semantic Evaluation. 1--9.Google ScholarGoogle Scholar
  44. Marc Verhagen and James Pustejovsky. 2008. Temporal processing with the TARSQI toolkit. In Proceedings of the 22nd International Conference on on Computational Linguistics: Demonstration Papers. 189--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Marc Verhagen, Roser Saurí, Tommaso Caselli, and James Pustejovsky. 2010. SemEval-2010 Task 13: TempEval-2. In Proceedings of the 5th International Workshop on Semantic Evaluation. 57--62. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Time for More Languages: Temporal Tagging of Arabic, Italian, Spanish, and Vietnamese

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian Language Information Processing
        ACM Transactions on Asian Language Information Processing  Volume 13, Issue 1
        February 2014
        93 pages
        ISSN:1530-0226
        EISSN:1558-3430
        DOI:10.1145/2590408
        Issue’s Table of Contents

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 February 2014
        • Accepted: 1 October 2013
        • Revised: 1 September 2013
        • Received: 1 May 2013
        Published in talip Volume 13, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader