skip to main content
10.5555/1567545.1567563dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
research-article
Free Access

Morphological annotation of the Lithuanian corpus

Published:29 June 2007Publication History

ABSTRACT

As the development of information technologies makes progress, large morphologically annotated corpora become a necessity, as they are necessary for moving onto higher levels of language computerisation (e. g. automatic syntactic and semantic analysis, information extraction, machine translation). Research of morphological disambiguation and morphological annotation of the 100 million word Lithuanian corpus are presented in the article. Statistical methods have enabled to develop the automatic tool of morphological annotation for Lithuanian, with the disambiguation precision of 94%. Statistical data about the distribution of parts of speech, most frequent wordforms, and lemmas, in the annotated Corpus of The Contemporary Lithuanian Language is also presented.

References

  1. Arulmozhi Palanisamy and Sobha Lalitha Devi. 2006. HMM based POS Tagger for a Relatively Free Word Order Language. Research in Computing Science 18, pp. 37--48Google ScholarGoogle Scholar
  2. Barbora Vidová-Hladká. 2000. Czech language tagging. Ph.D. thesis, ÚFAL MFF UK, Prague.Google ScholarGoogle Scholar
  3. Daniel Jurafsky, James H. Martin. 2000. Speech and Language Processing, Prentice-Hall, Upper Saddle River, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Erika Rimkutė. 2006. Morfologinio daugiareikšmiškumo ribojimas kompiuteriniame tekstyne (Morphological Disambiguation of the Corpus of Lithuanian Language). Doctoral dissertation, Vytautas Magnus University, Kaunas.Google ScholarGoogle Scholar
  5. Jan Hajič. 2004. Disambiguation of rich inflection. Computational morphology of Czech. Karolinum Charles University, Prague.Google ScholarGoogle Scholar
  6. Jan Hajič, Pavel Krbec, Pavel Květoň, Karel Oliva, Vladimír Petkevič. 2001. Serial Combination of Rules and Statistics: A Case Study in Czech Tagging. In Proceedings of the 39 Annual Meeting of the ACL (ACL-EACL 2001). Université de Sciences Sociales, Toulouse, France. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Łukasz Dębowski. 2004. Trigram morphosyntactic tagger for Polish. In Proceedings of the International IIS: IIPWM'04 Conference, pp. 409--413, Zakopane.Google ScholarGoogle ScholarCross RefCross Ref
  8. Vytautas Zinkevičius. 2000. Lemuoklis -- morfologinei analizei (A tool for morphological analysis - Lemuoklis). Darbai ir Dienos, 24, pp. 246--273. Vytautas Magnus University, Kaunas.Google ScholarGoogle Scholar
  9. Vytautas Zinkevičius, Vidas Daudaravičius, and Erika Rimkutė. 2005. The Morphologically annotated Lithuanian Corpus. In Proceedings of The Second Baltic Conference on Human Language Technologies, pp. 365--370. Tallinn.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image DL Hosted proceedings
    ACL '07: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
    June 2007
    111 pages

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    • Published: 29 June 2007

    Qualifiers

    • research-article

    Acceptance Rates

    ACL '07 Paper Acceptance Rate8of20submissions,40%Overall Acceptance Rate85of443submissions,19%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader