ABSTRACT
Automatic part of speech tagging is an area of natural language processing where statistical techniques have been more successful than rule-based methods. In this paper, we present a simple rule-based part of speech tagger which automatically acquires its rules and tags with accuracy comparable to stochastic taggers. The rule-based tagger has many advantages over these taggers, including: a vast reduction in stored information required, the perspicuity of a small set of meaningful rules, ease of finding and implementing improvements to the tagger, and better portability from one tag set, corpus genre or language to another. Perhaps the biggest contribution of this work is in demonstrating that the stochastic method is not the only viable method for part of speech tagging. The fact that a simple rule-based tagger that automatically learns its rules can perform so well should offer encouragement for researchers to further explore rule-based tagging, searching for a better and more expressive set of rule templates and other variations on the simple but effective theme described below.
- {Church 88} Church, K. A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In Proceedings of the Second Conference on Applied Natural Language Processing, ACL, 136--143, 1988. Google ScholarDigital Library
- {Cutting et al. 92} Cutting, D., Kupiec, J., Pederson, J. and Sibun, P. A Practical Part-of-Speech Tagger. In Proceedings of the Third Conference on Applied Natural Language Processing, ACL, 1992. Google ScholarDigital Library
- {DeRose 88} DeRose, S. J. Grammatical Category Disambiguation by Statistical Optimization. Computational Linguistics 14: 31--39, 1988. Google ScholarDigital Library
- {Deroualt and Merialdo 86} Deroualt, A. and Merialdo, B. Natural language modeling for phoneme-to-text transcription. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-8, No. 6, 742--749, 1986. Google ScholarDigital Library
- {Francis and Kučera 82} Francis, W. Nelson and Kučera, Henry, Frequency analysis of English usage. Lexicon and grammar. Houghton Mifflin, Boston, 1982.Google Scholar
- {Garside et al. 87} Garside, R., Leech, G. & Sampson, G. The Computational Analysis of English: A Corpus-Based Approach. Longman: London, 1987.Google Scholar
- {Green and Rubin 71} Green, B. and Rubin, G. Automated Grammatical Tagging of English. Department of Linguistics, Brown University, 1971.Google Scholar
- {Hindle 89} Hindle, D. Acquiring disambiguation rules from text. Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics, 1989. Google ScholarDigital Library
- {Jelinek 85} Jelinek, F. Markov source modeling of text generation. In J. K. Skwirzinski, ed., Impact of Processing Techniques on Communication, Dordrecht, 1985.Google ScholarCross Ref
- {Klein and Simmons 63} Klein, S. and Simmons, R. F. A Computational Approach to Grammatical Coding of English Words. JACM 10: 334--47. 1963. Google ScholarDigital Library
- {Kupiec 89} Kupiec, J. Augmenting a hidden Markov model for phrase-dependent word tagging. In Proceedings of the DARPA Speech and Natural Language Workshop, Morgan Kaufmann, 1989. Google ScholarDigital Library
- {Meteer et al. 91} Meteer, M., Schwartz, R., and Weischedel, R. Empirical Studies in Part of Speech Labelling, Proceedings of the DARPA Speech and Natural Language Workshop, Morgan Kaufmann, 1991. Google ScholarDigital Library
- A simple rule-based part of speech tagger
Recommendations
Toward an Effective Igbo Part-of-Speech Tagger
Part-of-speech (POS) tagging is a well-established technology for most Western European languages and a few other world languages, but it has not been evaluated on Igbo, an agglutinative African language. This article presents POS tagging experiments ...
A simple rule-based part of speech tagger
HLT '91: Proceedings of the workshop on Speech and Natural LanguageAutomatic part of speech tagging is an area of natural language processing where statistical techniques have been more successful than rule-based methods. In this paper, we present a simple rule-based part of speech tagger which automatically acquires ...
SVM Based Part of Speech Tagger for Malayalam
ITC '10: Proceedings of the 2010 International Conference on Recent Trends in Information, Telecommunication and ComputingThis paper presents the building of part-of-speech Tagger for Malayalam Language using Support Vector Machine (SVM). POS tagger plays an important role in Natural language applications like speech recognition, natural language parsing, information ...
Comments