ABSTRACT
This paper presents a new management method for morphological variation of keywords. The method is called FCG, Frequent Case Generation. It is based on the skewed distributions of word forms in natural languages and is suitable for languages that have either fair amount of morphological variation or are morphologically very rich. The proposed method has been evaluated so far with four languages, Finnish, Swedish, German and Russian, which show varying degrees of morphological complexity.
- Baayen, R. H. Statistical Models for Word Frequency Distribution. Computers and the Humanities 26 (1993): 347--363.Google ScholarCross Ref
- Baayen, R. H. Word Frequency Distributions. Kluwer Academic Publishers, Dordrecht Boston London, 2001.Google Scholar
- Karlsson, F. Frequency Considerations in Morphology. Zeitsschrift fr Phonetik, Sprachwissenschaft und Kommunikationsforschung 39 (1986): 19--28.Google Scholar
- Karlsson, F. Defectivity. In: Booij G. et al. (eds.): Morphology. An International Handbook on Inflection and Word-Formation. Volume 1. Walter de Gruyter, Berlin, 2000, 647--654.Google Scholar
- Kettunen, K. and Airio, E. Is a morphologically complex language really that complex in full-text retrieval? In T. Salakoski et al. (Eds.): Advances in Natural Language Processing, LNAI 4139. Springer-Verlag Berlin Heidelberg, 2006, 411--422. Google ScholarDigital Library
- Kettunen, K., Airio, E. and Järvelin, K. Restricted Inflectional Form Generation in Management of Morphological Keyword Variation. Information Retrieval (to appear). Google ScholarDigital Library
- Kosti , A., Markovi , T. and Baucal, A. Inflectional Morphology and Word Meaning: Orthogonal or Co-implicative Cognitive Domains. In: Baayen, R.H. and Schreuder R. (eds.): Morphological Structure in Language Processing. Trends in Linguistics, Studies and Monographs 151. Mouton de Gruyter, Berlin, 2003, 1--43.Google Scholar
- Perebeynoss, V. and Khidekel, S. Frequency of Language Units as a Reflection of Their Systemic and Functional Properties. Journal of Quantitative Linguistics 11 (2004): 3--25.Google ScholarCross Ref
Index Terms
- Management of keyword variation with frequency based generation of word forms in IR
Recommendations
Word normalization and decompounding in mono- and bilingual IR
AbstractThe present research studies the impact of decompounding and two different word normalization methods, stemming and lemmatization, on monolingual and bilingual retrieval. The languages in the monolingual runs are English, Finnish, German and ...
Sub-Word Indexing and Blind Relevance Feedback for English, Bengali, Hindi, and Marathi IR
The Forum for Information Retrieval Evaluation (FIRE) provides document collections, topics, and relevance assessments for information retrieval (IR) experiments on Indian languages. Several research questions are explored in this article: 1) How to ...
A Study on Corpus-based Stopword Lists in Indian Language IR
We explore and evaluate the effect of different stopword lists (non-corpus-based and corpus-based) in the information retrieval (IR) tasks with different Indian languages such as Bengali, Marathi, Gujarati, Hindi, and English. The issue was investigated ...
Comments