Article

Management of keyword variation with frequency based generation of word forms in IR

Author:
Kimmo Kettunen

University of Tampere, Tampere, Finland

University of Tampere, Tampere, Finland
View Profile

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrievalJuly 2007Pages 691–692https://doi.org/10.1145/1277741.1277861

Published:23 July 2007Publication History

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 691–692

ABSTRACT

This paper presents a new management method for morphological variation of keywords. The method is called FCG, Frequent Case Generation. It is based on the skewed distributions of word forms in natural languages and is suitable for languages that have either fair amount of morphological variation or are morphologically very rich. The proposed method has been evaluated so far with four languages, Finnish, Swedish, German and Russian, which show varying degrees of morphological complexity.

References

Baayen, R. H. Statistical Models for Word Frequency Distribution. Computers and the Humanities 26 (1993): 347--363.Google ScholarCross Ref
Baayen, R. H. Word Frequency Distributions. Kluwer Academic Publishers, Dordrecht Boston London, 2001.Google Scholar
Karlsson, F. Frequency Considerations in Morphology. Zeitsschrift fr Phonetik, Sprachwissenschaft und Kommunikationsforschung 39 (1986): 19--28.Google Scholar
Karlsson, F. Defectivity. In: Booij G. et al. (eds.): Morphology. An International Handbook on Inflection and Word-Formation. Volume 1. Walter de Gruyter, Berlin, 2000, 647--654.Google Scholar
Kettunen, K. and Airio, E. Is a morphologically complex language really that complex in full-text retrieval? In T. Salakoski et al. (Eds.): Advances in Natural Language Processing, LNAI 4139. Springer-Verlag Berlin Heidelberg, 2006, 411--422. Google ScholarDigital Library
Kettunen, K., Airio, E. and Järvelin, K. Restricted Inflectional Form Generation in Management of Morphological Keyword Variation. Information Retrieval (to appear). Google ScholarDigital Library
Kosti , A., Markovi , T. and Baucal, A. Inflectional Morphology and Word Meaning: Orthogonal or Co-implicative Cognitive Domains. In: Baayen, R.H. and Schreuder R. (eds.): Morphological Structure in Language Processing. Trends in Linguistics, Studies and Monographs 151. Mouton de Gruyter, Berlin, 2003, 1--43.Google Scholar
Perebeynoss, V. and Khidekel, S. Frequency of Language Units as a Reflection of Their Systemic and Functional Properties. Journal of Quantitative Linguistics 11 (2004): 3--25.Google ScholarCross Ref

Index Terms

Management of keyword variation with frequency based generation of word forms in IR
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Word normalization and decompounding in mono- and bilingual IR
Abstract
The present research studies the impact of decompounding and two different word normalization methods, stemming and lemmatization, on monolingual and bilingual retrieval. The languages in the monolingual runs are English, Finnish, German and ...
Read More
Sub-Word Indexing and Blind Relevance Feedback for English, Bengali, Hindi, and Marathi IR

The Forum for Information Retrieval Evaluation (FIRE) provides document collections, topics, and relevance assessments for information retrieval (IR) experiments on Indian languages. Several research questions are explored in this article: 1) How to ...
Read More
A Study on Corpus-based Stopword Lists in Indian Language IR
We explore and evaluate the effect of different stopword lists (non-corpus-based and corpus-based) in the information retrieval (IR) tasks with different Indian languages such as Bengali, Marathi, Gujarati, Hindi, and English. The issue was investigated ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
July 2007
946 pages
ISBN:9781595935977
DOI:10.1145/1277741
General Chairs:
Wessel Kraaij
TNO, The Netherlands
,
Arjen P. de Vries
CWI, The Netherlands
,
Program Chairs:
Charles L. A. Clarke
University of Waterloo, Canada
,
Norbert Fuhr
University of Duisburg-Essen, Germany
,
Noriko Kando
National Institute of Informatics, Japan
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 July 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
evaluation
management of morphological variation
monolingual information retrieval
word form generation
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 441
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Management of keyword variation with frequency based generation of word forms in IR

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Word normalization and decompounding in mono- and bilingual IR

Sub-Word Indexing and Blind Relevance Feedback for English, Bengali, Hindi, and Marathi IR

A Study on Corpus-based Stopword Lists in Indian Language IR