research-article

Toward an Effective Igbo Part-of-Speech Tagger

Authors:
Ikechukwu E. Onyenwe

University of Sheffield, South Yorkshire, UK

University of Sheffield, South Yorkshire, UK

0000-0002-9727-7297
View Profile

,
Mark Hepple

University of Sheffield, South Yorkshire, UK

University of Sheffield, South Yorkshire, UK
View Profile

,
Uchechukwu Chinedu

Nnamdi Azikiwe University, Awka, Anambra, Nigeria

Nnamdi Azikiwe University, Awka, Anambra, Nigeria
View Profile

,
Ignatius Ezeani

University of Sheffield, South Yorkshire, UK

University of Sheffield, South Yorkshire, UK
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 18 Issue 4Article No.: 42pp 1–26https://doi.org/10.1145/3314942

Published:21 May 2019Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Part-of-speech (POS) tagging is a well-established technology for most Western European languages and a few other world languages, but it has not been evaluated on Igbo, an agglutinative African language. This article presents POS tagging experiments conducted using an Igbo corpus as a test bed for identifying the POS taggers and the Machine Learning (ML) methods that can achieve a good performance with the small dataset available for the language. Experiments have been conducted using different well-known POS taggers developed for English or European languages, and different training data styles and sizes. Igbo has a number of language-specific characteristics that present a challenge for effective POS tagging. One interesting case is the wide use of verbs (and nominalizations thereof) that have an inherent noun complement, which form “linked pairs” in the POS tagging scheme, but which may appear discontinuously. Another issue is Igbo’s highly productive agglutinative morphology, which can produce many variant word forms from a given root. This productivity is a key cause of the out-of-vocabulary (OOV) words observed during Igbo tagging. We report results of experiments on a promising direction for improving tagging performance on such morphologically-inflected OOV words.

References

Mohammed A. Attia. 2008. Handling Arabic morphological and syntactic ambiguity within the LFG framework with a view to machine translation. Ph.D. Dissertation. University of Manchester.Google Scholar
E. S. Atwell. 2008. Development of tag sets for part-of-speech tagging. Walter de Gruyter.Google Scholar
Cheikh M. Bamba Dione, Jonas Kuhn, and Sina Zarrieß. 2010. Design and development of part-of-speech-tagging resources for Wolof (Niger-Congo, spoken in Senegal). In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA).Google Scholar
Laurent Besacier, V.-B. Le, Christian Boitet, and Vincent Berment. 2006. ASR and translation for under-resourced language. In Proceedings of 2006 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. 5. IEEE, V--V.Google ScholarCross Ref
Thorsten Brants. 2000. TnT: A statistical part-of-speech tagger. In Proceedings of the 6th Conference on Applied Natural Language Processing. ACL, 224--231. Google ScholarDigital Library
Eric Brill. 1995. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Comput. Ling. 21, 4 (1995), 543--565. Google ScholarDigital Library
Eric Brill. 1995. Unsupervised learning of disambiguation rules for part of speech tagging. In Proceedings of the 3rd Workshop on Very Large Corpora, vol. 30, Somerset, New Jersey. ACL, 1--13.Google Scholar
Sandipan Brill, EricDandapat, Sudeshna Sarkar, and Anupam Basu. 2007. Automatic part-of-speech tagging for Bengali: An approach for morphologically rich languages in a poor resource scenario. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, Prague, Czech Republic, Vol. 30. ACL, 221--224. Google ScholarDigital Library
Nicoletta Calzolari, Riccardo Del Gratta, Gil Francopoulo, Joseph Mariani, Francesco Rubino, Irene Russo, and Claudia Soria. 2012. The LRE map. Harmonising community descriptions of resources. In LREC. 1084--1089.Google Scholar
Walter Daelemans, Jakub Zavrel, Peter Berck, and Steven Gillis. 1996. MBT: A memory-based part of speech tagger-generator. In Arxiv Preprint Cmp-lg/9607012.Google Scholar
G. De Pauw, Gilles-Maurice de Schryverz, and J. ṽan de Looy. 2012. Resource-light Bantu part-of-speech tagging. In Proceedings of the Workshop on Language Technology for Normalisation of Less-Resourced Languages, SaLTMiL 8--AfLaT2012. European Language Resources Association (ELRA), 85--92.Google Scholar
Nọlue E. Emenanjo. 1978. Elements of Modern Igbo Grammar: A Descriptive Approach. Ibadan Oxford University Press.Google Scholar
Péter Halácsy, András Kornai, and Csaba Oravecz. 2007. HunPos: An open source trigram tagger. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. ACL, 209--212. Google ScholarDigital Library
U. Heid, E. Taljard, and D. &Jtilde;. Prinsloo. 2006. Grammar-based tools for the creation of tagging resources for an unresourced language: The case of Northern Sotho. In 5th Edition of International Conference on Language Resources and Evaluations.Google Scholar
Mark Hepple. 2000. Independence and commitment: Assumptions for rapid training and execution of rule-based PoS taggers. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. ACL, 278--277. Google ScholarDigital Library
Daniel Jurafsky and James H. Martin. 2016. Part of Speech Tagging. Speech and language processing, Draft of November 7, 2016, Academic Press Limited. Retrieved from https://web.stanford.edu/&sim;jurafsky/slp3/9.pdf.Google Scholar
F. Karlsson. 1995. Designing a parser for unrestricted text. In Constraint Grammar—A Language-Independent System for Parsing Unrestricted Text. F. Karlsson, A. Voutilainen, J. Heikkila, and A. Anttila, (Eds). Mouton de Gruyter, Berlin, New York, 1--40.Google Scholar
Steven Krauwer. 2003. The basic language resource kit (BLARK) as the first milestone for the language resources roadmap. Proceedings of SPECOM 2003 (2003), 8--15.Google Scholar
Grace Ngai and Radu Florian. 2001. Transformation-based learning in the fast lane. In Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies. ACL, 1--8. Google ScholarDigital Library
Ikechukwu Onyenwe, Mark Hepple, and Uchechukwu Chinedu. 2016. Améliorer la précision d’annotation d’un corpus Igbo par reconstruction morphologique et l’apprentissage basé sur la transformation. In Atelier Traitement Automatique des Langues Africaines (TALAF). JEP-TALN 2016, Vol. 11.Google Scholar
Ikechukwu Ekene Onyenwe. 2017. Developing Methods and Resources for Automated Processing of the African Language Igbo. Ph.D. Dissertation. University of Sheffield.Google Scholar
Ikechukwu E. Onyenwe and Mark Hepple. 2016. Predicting Morphologically-Complex Unknown Words in Igbo. In International Conference on Text, Speech, and Dialogue, Vol. 9924. Springer, 206--214.Google Scholar
Ikechukwu E. Onyenwe, Mark Hepple, Uchechukwu Chinedu, and Ignatius Ezeani. 2018. A Basic Language Resource Kit Implementation for the Igbo NLP Project. ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP) 17, 2 (2018), 10. Google ScholarDigital Library
Braja Gopal Patra, Khumbar Debbarma, Dipankar Das, and Sivaji Bandyopadhyay. 2012. Part of speech (POS) tagger for Kokborok. Proceedings of COLING 2012: Posters (2012), 923--932.Google Scholar
Adwait Ratnaparkhi. 1996. A maximum entropy model for part-of-speech tagging. In Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, Vol. 1. 133--142.Google Scholar
Navanath Saharia, Dhrubajyoti Das, Utpal Sharma, and Jugal Kalita. 2009. Part of speech tagger for Assamese text. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. ACL, 33--36. Google ScholarDigital Library
Christer Samuelsson. 1994. Morphological tagging based entirely on Bayesian inference. In Proceedings of the 9th Nordic Conference of Computational Linguistics (NODALIDA 1993). 225--238.Google Scholar
Smriti Singh, Kuhoo Gupta, Manish Shrivastava, and Pushpak Bhattacharyya. 2006. Morphological richness offsets resource demand-experiences in constructing a POS tagger for Hindi. In Proceedings of the COLING/ACL on Main Conference Poster Sessions. ACL, 779--786. Google ScholarDigital Library
Scott M. Thede and Mary P. Harper. 1999. A second-order hidden Markov model for part-of-speech tagging. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics. ACL, 175--182. Google ScholarDigital Library
Kristina Toutanova, Dan Klein, Christopher D Manning, and Yoram Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology—Volume 1. ACL, 173--180. Google ScholarDigital Library

Index Terms

Toward an Effective Igbo Part-of-Speech Tagger
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
      2. Phonology / morphology
  2. Machine learning

Recommendations

Part-of-Speech (POS) Tagging Using Deep Learning-Based Approaches on the Designed Khasi POS Corpus
Part-of-speech (POS) tagging is one of the research challenging fields in natural language processing (NLP). It requires good knowledge of a particular language with large amounts of data or corpora for feature engineering, which can lead to achieving a ...
Read More
A Basic Language Resource Kit Implementation for the IgboNLP Project

Igbo, an African language with around 32 million speakers worldwide, is one of the many languages having few or none of the language processing resources needed for advanced language technology applications. In this article, we describe the approach ...
Read More
SVM Based Part of Speech Tagger for Malayalam
ITC '10: Proceedings of the 2010 International Conference on Recent Trends in Information, Telecommunication and Computing

This paper presents the building of part-of-speech Tagger for Malayalam Language using Support Vector Machine (SVM). POS tagger plays an important role in Natural language applications like speech recognition, natural language parsing, information ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 18, Issue 4
December 2019
305 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3327969
Editor:
Nianwen Xue
Brandeis University, Waltham, USA
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 May 2019
- Accepted: 1 February 2019
- Revised: 1 October 2018
- Received: 1 May 2018
Published in tallip Volume 18, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
African language
Igbo
Natural language processing (NLP)
POS tagger
corpora
corpus annotation
language technology
machine learning
morphological analysis
part-of-speech (POS) tagging
tagset
text processing
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 188
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Toward an Effective Igbo Part-of-Speech Tagger

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Part-of-Speech (POS) Tagging Using Deep Learning-Based Approaches on the Designed Khasi POS Corpus

A Basic Language Resource Kit Implementation for the IgboNLP Project

SVM Based Part of Speech Tagger for Malayalam

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Toward an Effective Igbo Part-of-Speech Tagger

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Part-of-Speech (POS) Tagging Using Deep Learning-Based Approaches on the Designed Khasi POS Corpus

A Basic Language Resource Kit Implementation for the IgboNLP Project

SVM Based Part of Speech Tagger for Malayalam

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media