research-article

Free Access

Assas-Band, an affix-exception-list based Urdu stemmer

Authors:
Qurat-ul-Ain Akram

NUCES, Pakistan

NUCES, Pakistan
View Profile

,
Asma Naseer

NUCES, Pakistan

NUCES, Pakistan
View Profile

,
Sarmad Hussain

NUCES, Pakistan

NUCES, Pakistan
View Profile

Authors Info & Claims

ALR7: Proceedings of the 7th Workshop on Asian Language ResourcesAugust 2009Pages 40–46

Published:06 August 2009Publication History

ALR7: Proceedings of the 7th Workshop on Asian Language Resources

Pages 40–46

ABSTRACT

Both Inflectional and derivational morphology lead to multiple surface forms of a word. Stemming reduces these forms back to its stem or root, and is a very useful tool for many applications. There has not been any work reported on Urdu stemming. The current work develops an Urdu stemmer or Assas-Band and improves the performance using more precise affix based exception lists, instead of the conventional lexical lookup employed for developing stemmers in other languages. Testing shows an accuracy of 91.2%. Further enhancements are also suggested.

References

Croft, W. B. and Xu, J. 1995. Corpus-Specific Stemming using Word Form Co-occurrences. In Fourth Annual Symposium on Document Analysis and Information Retrieval.Google Scholar
Krovetz, R. 1993. View Morphology as an Inference Process. In the Proceedings of 5th International Conference on Research and Development in Information Retrieval. Google ScholarDigital Library
Porter, M. 1980. An Algorithm for Suffix Stripping. Program, 14(3): 130--137.Google ScholarDigital Library
Thabet, N. 2004. Stemming the Qur'an. In the Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages. Google ScholarDigital Library
Hussain, Sara. 2004. Finite-State Morphological Analyzer for Urdu. Unpublished MS thesis, Center for Research in Urdu Language Processing, National University of Computer and Emerging Sciences, Pakistan.Google Scholar
Sajjad, H. 2007. Statistical Part-of-Speech for Urdu. Unpublished MS Thesis, Center for Research in Urdu Language Processing, National University of Computer and Emerging Sciences, Pakistan.Google Scholar
Ijaz, M and Hussain, S. 2007. Corpus Based Urdu Lexicon Development. In the Proceedings of Conference on Language Technology (CLT07), Pakistan.Google Scholar
Naseem, T., Hussain, S. 2007. Spelling Error Trends in Urdu. In the Proceedings of Conference on Language Technology (CLT07), Pakistan.Google Scholar
Kumar, M. S. and Murthy, K. N. 2007. Corpus Based Statistical Approach for Stemming Telugu. Creation of Lexical Resources for Indian Language Computing and Processing (LRIL), C-DAC, Mumbai, India.Google Scholar
Paik, J. H. and Parui, S. K. 2008. A Simple Stemmer for Inflectional Languages. Forum for Information Retrieval Evaluation.Google Scholar
Islam, M. Z., Uddin, M. N. and Khan, M. 2007. A Light Weight Stemmer for Bengali and Its Use in Spelling Checker. In the Proceedings of 1st Intl. Conf. on Digital Comm. and Computer, Amman, Jordan.Google Scholar
Sharifloo, A. A. and Shamsfard, M. 2008. A Bottom up Approach to Persian Stemming. In the Proceedings of the Third International Joint Conference on Natural Language Processing. Hyderabad, India.Google Scholar
Kumar, A. and Siddiqui, T. 2008. An Unsupervised Hindi Stemmer with Heuristics Improvements. In Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data. Google ScholarDigital Library

Index Terms

Assas-Band, an affix-exception-list based Urdu stemmer
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Learning paradigms
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection

Recommendations

A Fast Corpus-Based Stemmer

Stemming is a mechanism of word form normalization that transforms the variant word forms to their common root. In an Information Retrieval system, it is used to increase the system’s performance, specifically the recall and desirably the precision. ...
Read More
The Rule-Based Sundanese Stemmer

Our research proposed an iterative Sundanese stemmer by removing the derivational affixes prior to the inflexional. This scheme was chosen because, in the Sundanese affixation, a confix (one of derivational affix) is applied in the last phase of a ...
Read More
Hindi Stemmer @ FIRE-2013
FIRE '12 & '13: Proceedings of the 4th and 5th Annual Meetings of the Forum for Information Retrieval Evaluation

This paper describes a language independent approach for extracting Hindi morpheme from a given list of Hindi words of Morpheme Extraction Task (MET) at FIRE 2013. In this approach list of Hindi word is submitted to the system and it generates stemmed ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ALR7: Proceedings of the 7th Workshop on Asian Language Resources
August 2009
196 pages
ISBN:9781932432565
Program Chairs:
Hammam Riza
IPTEKnet-BPPT
,
Virach Sornlertlamvanich
NECTEC
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 6 August 2009
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 475
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Assas-Band, an affix-exception-list based Urdu stemmer

ALR7: Proceedings of the 7th Workshop on Asian Language Resources

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Fast Corpus-Based Stemmer

The Rule-Based Sundanese Stemmer

Hindi Stemmer @ FIRE-2013

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Assas-Band, an affix-exception-list based Urdu stemmer

ALR7: Proceedings of the 7th Workshop on Asian Language Resources

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Fast Corpus-Based Stemmer

The Rule-Based Sundanese Stemmer

Hindi Stemmer @ FIRE-2013

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media