research-article

Free Access

Morphological annotation of the Lithuanian corpus

Authors:
Vidas Daudaravičius

Vytautas Magnus University, Kaunas, Lithuania

Vytautas Magnus University, Kaunas, Lithuania
View Profile

,
Erika Rimkutė

Vytautas Magnus University, Kaunas, Lithuania

Vytautas Magnus University, Kaunas, Lithuania
View Profile

,
Andrius Utka

Vytautas Magnus University, Kaunas, Lithuania

Vytautas Magnus University, Kaunas, Lithuania
View Profile

ACL '07: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling TechnologiesJune 2007Pages 94–99

Published:29 June 2007Publication History

ACL '07: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies

Pages 94–99

ABSTRACT

As the development of information technologies makes progress, large morphologically annotated corpora become a necessity, as they are necessary for moving onto higher levels of language computerisation (e. g. automatic syntactic and semantic analysis, information extraction, machine translation). Research of morphological disambiguation and morphological annotation of the 100 million word Lithuanian corpus are presented in the article. Statistical methods have enabled to develop the automatic tool of morphological annotation for Lithuanian, with the disambiguation precision of 94%. Statistical data about the distribution of parts of speech, most frequent wordforms, and lemmas, in the annotated Corpus of The Contemporary Lithuanian Language is also presented.

References

Arulmozhi Palanisamy and Sobha Lalitha Devi. 2006. HMM based POS Tagger for a Relatively Free Word Order Language. Research in Computing Science 18, pp. 37--48Google Scholar
Barbora Vidová-Hladká. 2000. Czech language tagging. Ph.D. thesis, ÚFAL MFF UK, Prague.Google Scholar
Daniel Jurafsky, James H. Martin. 2000. Speech and Language Processing, Prentice-Hall, Upper Saddle River, NJ. Google ScholarDigital Library
Erika Rimkutė. 2006. Morfologinio daugiareikšmiškumo ribojimas kompiuteriniame tekstyne (Morphological Disambiguation of the Corpus of Lithuanian Language). Doctoral dissertation, Vytautas Magnus University, Kaunas.Google Scholar
Jan Hajič. 2004. Disambiguation of rich inflection. Computational morphology of Czech. Karolinum Charles University, Prague.Google Scholar
Jan Hajič, Pavel Krbec, Pavel Květoň, Karel Oliva, Vladimír Petkevič. 2001. Serial Combination of Rules and Statistics: A Case Study in Czech Tagging. In Proceedings of the 39 Annual Meeting of the ACL (ACL-EACL 2001). Université de Sciences Sociales, Toulouse, France. Google ScholarDigital Library
Łukasz Dębowski. 2004. Trigram morphosyntactic tagger for Polish. In Proceedings of the International IIS: IIPWM'04 Conference, pp. 409--413, Zakopane.Google ScholarCross Ref
Vytautas Zinkevičius. 2000. Lemuoklis -- morfologinei analizei (A tool for morphological analysis - Lemuoklis). Darbai ir Dienos, 24, pp. 246--273. Vytautas Magnus University, Kaunas.Google Scholar
Vytautas Zinkevičius, Vidas Daudaravičius, and Erika Rimkutė. 2005. The Morphologically annotated Lithuanian Corpus. In Proceedings of The Second Baltic Conference on Human Language Technologies, pp. 365--370. Tallinn.Google Scholar

Recommendations

Bulgarian-Polish-Lithuanian corpus: current development
MRTECEEL '09: Proceedings of the Workshop on Multilingual Resources, Technologies and Evaluation for Central and Eastern European Languages

This paper discusses the building of the first Bulgarian---Polish---Lithuanian (for short, BG---PL---LT) experimental corpus. The BG---PL---LT corpus (currently under development only for research) contains more than 3 million words and comprises two ...
Read More
Statistical Language Models of Lithuanian Based on Word Clustering and Morphological Decomposition

This paper describes our research on statistical language modeling of Lithuanian. The idea of improving sparse n-gram models of highly inflected Lithuanian language by interpolating them with complex n-gram models based on word clustering and morphological ...
Read More
Experiments in cross-language morphological annotation transfer
CICLing'06: Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing

Annotated corpora are valuable resources for NLP which are often costly to create. We introduce a method for transferring annotation from a morphologically annotated corpus of a source language to a target language. Our approach assumes only that an ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACL '07: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
June 2007
111 pages
Conference Chairs:
Jakub Piskorski
Joint Research Centre, IPSC
,
Bruno Pouliquen
Joint Research Centre, IPSC
,
Ralf Steinberger
Joint Research Centre, IPSC
,
Hristo Tanev
Joint Research Centre, IPSC
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 29 June 2007
Qualifiers
- research-article
Conference

Acceptance Rates
ACL '07 Paper Acceptance Rate8of20submissions,40%Overall Acceptance Rate85of443submissions,19%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 342
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Morphological annotation of the Lithuanian corpus

ACL '07: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies

ABSTRACT

References

Cited By

Recommendations

Bulgarian-Polish-Lithuanian corpus: current development

Statistical Language Models of Lithuanian Based on Word Clustering and Morphological Decomposition

Experiments in cross-language morphological annotation transfer

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Morphological annotation of the Lithuanian corpus

ACL '07: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies

ABSTRACT

References

Cited By

Recommendations

Bulgarian-Polish-Lithuanian corpus: current development

Statistical Language Models of Lithuanian Based on Word Clustering and Morphological Decomposition

Experiments in cross-language morphological annotation transfer

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media