skip to main content
10.5555/1690299.1690305dlproceedingsArticle/Chapter ViewAbstractPublication PagesalrConference Proceedingsconference-collections
research-article
Free Access

Assas-Band, an affix-exception-list based Urdu stemmer

Published:06 August 2009Publication History

ABSTRACT

Both Inflectional and derivational morphology lead to multiple surface forms of a word. Stemming reduces these forms back to its stem or root, and is a very useful tool for many applications. There has not been any work reported on Urdu stemming. The current work develops an Urdu stemmer or Assas-Band and improves the performance using more precise affix based exception lists, instead of the conventional lexical lookup employed for developing stemmers in other languages. Testing shows an accuracy of 91.2%. Further enhancements are also suggested.

References

  1. Croft, W. B. and Xu, J. 1995. Corpus-Specific Stemming using Word Form Co-occurrences. In Fourth Annual Symposium on Document Analysis and Information Retrieval.Google ScholarGoogle Scholar
  2. Krovetz, R. 1993. View Morphology as an Inference Process. In the Proceedings of 5th International Conference on Research and Development in Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Porter, M. 1980. An Algorithm for Suffix Stripping. Program, 14(3): 130--137.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Thabet, N. 2004. Stemming the Qur'an. In the Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hussain, Sara. 2004. Finite-State Morphological Analyzer for Urdu. Unpublished MS thesis, Center for Research in Urdu Language Processing, National University of Computer and Emerging Sciences, Pakistan.Google ScholarGoogle Scholar
  6. Sajjad, H. 2007. Statistical Part-of-Speech for Urdu. Unpublished MS Thesis, Center for Research in Urdu Language Processing, National University of Computer and Emerging Sciences, Pakistan.Google ScholarGoogle Scholar
  7. Ijaz, M and Hussain, S. 2007. Corpus Based Urdu Lexicon Development. In the Proceedings of Conference on Language Technology (CLT07), Pakistan.Google ScholarGoogle Scholar
  8. Naseem, T., Hussain, S. 2007. Spelling Error Trends in Urdu. In the Proceedings of Conference on Language Technology (CLT07), Pakistan.Google ScholarGoogle Scholar
  9. Kumar, M. S. and Murthy, K. N. 2007. Corpus Based Statistical Approach for Stemming Telugu. Creation of Lexical Resources for Indian Language Computing and Processing (LRIL), C-DAC, Mumbai, India.Google ScholarGoogle Scholar
  10. Paik, J. H. and Parui, S. K. 2008. A Simple Stemmer for Inflectional Languages. Forum for Information Retrieval Evaluation.Google ScholarGoogle Scholar
  11. Islam, M. Z., Uddin, M. N. and Khan, M. 2007. A Light Weight Stemmer for Bengali and Its Use in Spelling Checker. In the Proceedings of 1st Intl. Conf. on Digital Comm. and Computer, Amman, Jordan.Google ScholarGoogle Scholar
  12. Sharifloo, A. A. and Shamsfard, M. 2008. A Bottom up Approach to Persian Stemming. In the Proceedings of the Third International Joint Conference on Natural Language Processing. Hyderabad, India.Google ScholarGoogle Scholar
  13. Kumar, A. and Siddiqui, T. 2008. An Unsupervised Hindi Stemmer with Heuristics Improvements. In Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Assas-Band, an affix-exception-list based Urdu stemmer

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image DL Hosted proceedings
          ALR7: Proceedings of the 7th Workshop on Asian Language Resources
          August 2009
          196 pages
          ISBN:9781932432565

          Publisher

          Association for Computational Linguistics

          United States

          Publication History

          • Published: 6 August 2009

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader