note

Boosting Neural POS Tagger for Farsi Using Morphological Information

Authors:
Peyman Passban

ADAPT Centre, School of Computing, Dublin City University, Ireland, Dublin, Ireland

ADAPT Centre, School of Computing, Dublin City University, Ireland, Dublin, Ireland
View Profile

,
Qun Liu

ADAPT Centre, School of Computing, Dublin City University, Ireland, Dublin, Ireland

ADAPT Centre, School of Computing, Dublin City University, Ireland, Dublin, Ireland
View Profile

,
Andy Way

ADAPT Centre, School of Computing, Dublin City University, Ireland, Dublin, Ireland

ADAPT Centre, School of Computing, Dublin City University, Ireland, Dublin, Ireland
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 16 Issue 1Article No.: 4pp 1–15https://doi.org/10.1145/2934676

Published:22 July 2016Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Farsi (Persian) is a low-resource language that suffers from the data sparsity problem and a lack of efficient processing tools. Due to their broad application in natural language processing tasks, part-of-speech (POS) taggers are one of those important tools that should be considered in this respect. Despite recent work on Farsi tagging, there is still room for improvement. The best reported accuracy so far is 96%, which in special cases can rise to 96.9%. The main problem with existing taggers is their inefficiency in coping with out-of-vocabulary (OOV) words. Addressing both problems of accuracy and OOV words, we developed a neural network-based POS tagger (NPT) that performs efficiently on Farsi. Despite using less data, NPT provides better results in comparison to state-of-the-art systems. Our proposed tagger performs with an accuracy of 97.4%, with performance highly influenced by morphological features. We carry out a shallow morphological analysis and show considerable improvement over the baseline configuration.

References

James Bergstra, Frédéric Bastien, Olivier Breuleux, Pascal Lamblin, Razvan Pascanu, Olivier Delalleau, Guillaume Desjardins, et al. 2011. Theano: Deep learning on GPUs with Python. In Proceedings of Advances in Neural Information Processing Systems 24 (NIPS’11).Google Scholar
Mahmood Bijankhan, Javad Sheykhzadegan, Mohammad Bahrani, and Masood Ghayoomi. 2011. Lessons from building a Persian written corpus: Peykare. Language Resources and Evaluation 45, 2, 143--164. Google ScholarDigital Library
Thorsten Brants. 2000. TnT: A statistical part-of-speech tagger. In Proceedings of the 6th Conference on Applied Natural Language Processing. 224--231. Google ScholarDigital Library
Ronan Collobert, Koray Kavukcuoglu, and Clément Farabet. 2012. Implementing neural networks efficiently. In Neural Networks: Tricks of the Trade. Springer, 537--557.Google Scholar
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, 2493--2537. Google ScholarDigital Library
Erick R. Fonseca, João Luís G. Rosa, and Sandra Maria Aluísio. 2015. Evaluating word embeddings and a revised corpus for part-of-speech tagging in Portuguese. Journal of the Brazilian Computer Society 21, 1, 1--14.Google ScholarCross Ref
Eugenie Giesbrecht and Stefan Evert. 2009. Is part-of-speech tagging a solved task? An evaluation of POS taggers for the German Web as corpus. In Proceedings of the 5th Web as Corpus Workshop. 27--35.Google Scholar
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 249--256.Google Scholar
Péter Halácsy, András Kornai, and Csaba Oravecz. 2007. HunPos: An open source trigram tagger. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. 209--212. Google ScholarDigital Library
Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7, 1527--1554. Google ScholarDigital Library
Kurt Hornik, Maxwell Stinchcombe, and Halbert White. 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2, 5, 359--366. Google ScholarDigital Library
M. Jagadeesh, M. Anand Kumar, and K. P. Soman. 2016. Deep belief network based part-of-speech tagger for Telugu language. In Proceedings of the 2nd International Conference on Computer and Communication Technologies. 75--84.Google Scholar
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia (MM’14). ACM, New York, NY, 675--678. Google ScholarDigital Library
Ji Ma, Yue Zhang, and Jingbo Zhu. 2014. Tagging the Web: Building a robust Web tagger with neural network. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Vol. 1. 144--154.Google ScholarCross Ref
Christopher D. Manning. 2011. Part-of-speech tagging from 97&percnt; to 100&percnt;: Is it time for some linguistics? In Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing, Part I (CICLing’11). 171--189. Google ScholarDigital Library
William J. Masek and Michael S. Paterson. 1980. A faster algorithm computing string edit distances. Journal of Computer and System Sciences 20, 1, 18--31.Google ScholarCross Ref
Karine Megerdoomian. 2004. Developing a Persian part of speech tagger. In Proceedings of the 1st Workshop on Persian Language and Computer. 99--105.Google Scholar
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781.Google Scholar
Mahdi Mohseni and Behrouz Minaei-Bidgoli. 2010. A Persian part-of-speech tagger based on morphological analysis. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC’10). 1253--1257.Google Scholar
Farhad Oroumchian, Samira Tasharofi, Hadi Amiri, Hossein Hojjat, and Fahime Raja. 2006. Creating a Feasible Corpus for Persian POS Tagging. Technical Report No. TR3/06. University of Wollongong, New South Wales, Australia.Google Scholar
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532--1543. <url>http://www.aclweb.org/anthology/D14-1162</url>.Google Scholar
John R. Perry and Alan S. Kaye. 2007. Persian morphology. Morphologies of Asia and Africa 2, 975--1019.Google Scholar
Juan Antonio Prezortiz and Mikel L. Forcada. 2001. Part-of-speech tagging with recurrent neural networks. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’01).Google Scholar
Fahimeh Raja, Hadi Amiri, Samira Tasharofi, Mehdi Sarmadi, Hossein Hojjat, and Farhad Oroumchian. 2007. Evaluation of part of speech tagging on Persian text. In Proceedings of the 2nd Workshop on Computational Approaches to Arabic Script-Based Languages.Google Scholar
Cicero D. Santos and Bianca Zadrozny. 2014. Learning character-level representations for part-of-speech tagging. In Proceedings of the 31st International Conference on Machine Learning (ICML’14). 1818--1826.Google Scholar
Helmut Schmid. 1994. Part-of-speech tagging with neural networks. In Proceedings of the 15th Conference on Computational Linguistics, Volume 1 (COLING’94). 172--176. Google ScholarDigital Library
Mojgan Seraji. 2011. A statistical part-of-speech tagger for Persian. In Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA’11). 340--343.Google Scholar
Mojgan Seraji, Beáta Megyesi, and Joakim Nivre. 2012. A basic language resource kit for Persian. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12). 2245--2252.Google Scholar
Mehrnoush Shamsfard, Soheila Kiani, and Yaseer Shahedi. 2009. STeP-1: Standard text preparation for Persian language. In Proceedings of the 3rd Workshop on Computational Approaches to Arabic Script-Based Languages.Google Scholar
Huihsin Tseng, Daniel Jurafsky, and Christopher Manning. 2005. Morphological features help POS tagging of unknown words across language varieties. In Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing. 32--39.Google Scholar
Peilu Wang, Yao Qian, Frank K. Soong, Lei He, and Hai Zhao. 2015. Part-of-speech tagging with bidirectional long short-term memory recurrent neural network. arXiv:1510.06168.Google Scholar
Othman Zennaki, Nasredine Semmar, and Laurent Besacier. 2015. Unsupervised and Lightly Supervised Part-of-Speech Tagging Using Recurrent Neural Networks. Retrieved June 30, 2016, from https://aclweb.org/anthology/Y/Y15/Y15-1016.pdf.Google Scholar
Xiaoqing Zheng, Hanyang Chen, and Tianyu Xu. 2013. Deep learning for Chinese word segmentation and POS tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’13). 647--657.Google Scholar

Index Terms

Boosting Neural POS Tagger for Farsi Using Morphological Information
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Toward an Effective Igbo Part-of-Speech Tagger

Part-of-speech (POS) tagging is a well-established technology for most Western European languages and a few other world languages, but it has not been evaluated on Igbo, an agglutinative African language. This article presents POS tagging experiments ...
Read More
POS tagger for Urdu using Stochastic approaches
ICTCS '16: Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies

Part-of-Speech tagging is a problem of Natural language processing. It is a process of labeling an accurate part of speech for each word of a given corpus sentence. There are various approaches like rule based, stochastic and hybrid that are mainly used ...
Read More
A Comparative Study on the Efficiency of POS Tagging Techniques on Amazigh Corpus
NISS '19: Proceedings of the 2nd International Conference on Networking, Information Systems & Security

Part-of-speech (POS) tagging is a fundamental task of Natural Language Processing (NLP). It provides useful information for many other NLP tasks, including word sense disambiguation, text chunking, named entity recognition, syntactic parsing, semantic ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 16, Issue 1
TALLIP Notes and Regular Papers
March 2017
133 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/2961867
Editor:
Nianwen Xue
Brandeis University, Waltham, USA
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 July 2016
- Accepted: 1 April 2016
- Revised: 1 March 2016
- Received: 1 January 2016
Published in tallip Volume 16, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Farsi
POS tagging
morphological analysis
Qualifiers
- note
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 257
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Boosting Neural POS Tagger for Farsi Using Morphological Information

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Toward an Effective Igbo Part-of-Speech Tagger

POS tagger for Urdu using Stochastic approaches

A Comparative Study on the Efficiency of POS Tagging Techniques on Amazigh Corpus

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Boosting Neural POS Tagger for Farsi Using Morphological Information

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Toward an Effective Igbo Part-of-Speech Tagger

POS tagger for Urdu using Stochastic approaches

A Comparative Study on the Efficiency of POS Tagging Techniques on Amazigh Corpus

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media