research-article

Free Access

E-connecting Balkan languages

Authors:
Cvetana Krstev

University of Belgrade

University of Belgrade
View Profile

,
Ranka Stanković

University of Belgrade

University of Belgrade
View Profile

,
Duško Vitas

University of Belgrade

University of Belgrade
View Profile

,
Svetla Koeva

Institute for Bulgarian

Institute for Bulgarian
View Profile

MRTECEEL '09: Proceedings of the Workshop on Multilingual Resources, Technologies and Evaluation for Central and Eastern European LanguagesSeptember 2009Pages 19–25

Published:17 September 2009Publication History

MRTECEEL '09: Proceedings of the Workshop on Multilingual Resources, Technologies and Evaluation for Central and Eastern European Languages

Pages 19–25

ABSTRACT

In this paper we present a versatile language processing tool that can be successfully used for many Balkan languages. For its work, this tool relies on several sophisticated textual and lexical resources that have been developed for most Balkan languages. These resources are based on several de facto standards in natural language processing.

References

}}P. Bonhomme, T. M. H. Nguyen, S. O'Rourke. XAlign: l'aligneur de Langue & Dialogue, http://www.loria.fr/equipes/led/outils/ALIGN/align.html, 2001.Google Scholar
}}B. Courtois, M. Silberztein (eds.). Dictionnaires électroniques du français. Langue française. 87, Larousse, Paris, 1990.Google Scholar
}}D.-M. Dimitriu. Grammaires de flexion du roumain en format DELA, Rapport interne 2005--02 de l'Institut Gaspard-Monge, CNRS, 2005.Google Scholar
}}T. Erjavec and N. Ide. The MULTEXT-East Corpus. In LREC'98, Granada, pp. 971--974, 1998.Google Scholar
}}A. Gelbukh, G. Sidorov, J.-A. Vera-Félix. A Bilingual Corpus of Novels Aligned at Paragraph Level. In proc. FinTAL-2006. Lecture Notes in Artificial Intelligence, no. 4139, Springer-Verlag, pp. 16--23, 2006. Google ScholarDigital Library
}}ISO 24610. Language resource management -- Feature Structures, ISO/TC 37/SC 4, 2005.Google Scholar
}}S. Koeva. Modern language technologies -- applications and perspectives, in: Lows of/for language, Hejzal, Sofia, 2004, 111--157, 2004.Google Scholar
}}C. Krstev, et al. Combining Heterogeneous Lexical Resources, in Proc. of the Fourth International Conference LREC, Lisbon, Portugal, May 2004, vol. 4, pp. 1103--1106, 2004.Google Scholar
}}C. Krstev, R. Stanković, D. Vitas, I. Obradović. WS4LR: A Workstation for Lexical Resources, Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy, May 2006, pp. 1692--1697, 2006.Google Scholar
}}C. Krstev, R. Stanković, D. Vitas, I. Obradović, The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines, in Proceedings of the Sixth Interantional Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco, 28--30 May 2008, European Language Resources Association (ELRA), 2008.Google Scholar
}}C. Krstev. Processing of Serbian, Faculty of Phylology, University of Belgrade, Belgrade, 2008.Google Scholar
}}T. Kyriacopoulou. Les dictionnaires électroniques: Morphologie et syntaxe. Le cas du grec moderne, Proceedings AILA 1990, Chalcidique, 1990.Google Scholar
}}E. Laporte, T. Nakamura, S. Voyatzi. A French Corpus Annotated for Multiword Nouns, in: Towards a Shared Task for Multiword Expressions (MWE 2008), in scope of the Sixth Interantional Conference on Language Resources and Evaluation (LREC'08), http://multiword.sourceforge.net/download/MWE2008-papers/8_Laporte.pdf, 2008.Google Scholar
}}D. Maurel, D. Vitas, C. Krstev, S. Koeva. Prolex: a lexical model for translation of proper names. Application to French, Serbian and Bulgarian, in Bulag - Bulletin de Linguistique Appliquée et Générale, Les langues slaves et le français: approches formelles dans les études contrastives, eds. A. Dziadkiewicz & I. Thomas, No. 32, pp. 55--72, Presses Universitaires de Franche Comtéé, Besançon, 2007.Google Scholar
}}S. Paumier. Unitex 2.1 User Manual, http://www-igm.univ-mlv.fr/~unitex/UnitexManual2.1.pdf, 2008.Google Scholar
}}O. Piton, D. Maurel. Beijing frowns and Washington takes notice: Computer Processing of Relations between Geographical Proper Names in Foreign Affairs, Fourth International Workshop on Applications of Natural Language to Data Bases (NLDB '00), Versailles, 28--30 juin (Actes p. 66--78), 2000. Google ScholarDigital Library
}}R. Stanković. Improvement of Queries using a Rule Based Procedure for Inflection of Compounds and Phrases. Polibits (37) 2008, Special section: Natural Language Processing, Journal of Research and Development in Computer Science and Engineering, ed. Grigori Sidorov, Centro Innovación y Desarrollo Tecnológico en Computo, Instituto Politécnico Nacional, Mexico, pp. 14--20, 2008.Google Scholar
}}R. Steinberger, B. Pouliquen, A. Widiger, C. Ignat, T. Erjavec, D. Tufiş. 2006. The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In Proceedings of the 5th LREC Conference, Genoa, Italy, 22--28 May, 2006, pp. 2142--2147, 2006.Google Scholar
}}M. Tran, D. Maurel. Prolexbase: Un dictionnaire relationnel multilingue de noms propres, Traitement automatique des langues, Vol. 47--3, 2006.Google Scholar
}}D. Tufiş (ed.). Special Issue on BalkaNet Project, Romanian Journal on Information Science and Technology. Bucureşti: Publishing house of the Romanian academy, Vol. 7, No. 1--2, 2004.Google Scholar
}}D. Tufiş, S. Koeva, T. Erjavec, M. Gavrilidou, and C. Krstev. Building Language Resources and Translation Models for Machine Translation focused on South Slavic and Balkan Languages. In M. Tadić, M. Dimitrova-Vulchanova and S. Koeva (eds.) Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan Languages (FASSBL 2008), pp. 145--152, Dubrovnik, Croatia, September 25--28, 2008.Google Scholar
}}P. Vossen (ed.) EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Dordrecht: Kluwer Academic Publishers, 1998. Google ScholarDigital Library

Index Terms

E-connecting Balkan languages
1. Applied computing
  1. Document management and text processing
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Proper nouns in English–Arabic cross language information retrieval

Out of vocabulary words, mostly proper nouns and technical terms, are one main source of performance degradation in Cross Language Information Retrieval (CLIR) systems. Those are words not found in the dictionary. Bilingual dictionaries in general do ...
Read More
Exploiting aligned parallel corpora in multilingual studies and applications
IWIC'07: Proceedings of the 1st international conference on Intercultural collaboration

Parallel corpora encode extremely valuable linguistic knowledge, the revealing of which is facilitated by the recent advances in multilingual corpus linguistics. The linguistic decisions made by the human translators in order to faithfully convey the ...
Read More
RNN Language Model Estimation for Out-of-Vocabulary Words
Human Language Technology. Challenges for Computer Science and Linguistics
Abstract
One important issue of speech recognition systems is Out-of Vocabulary words (OOV). These words, often proper nouns or new words, are essential for documents to be transcribed correctly. Thus, they must be integrated in the language model (LM) and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MRTECEEL '09: Proceedings of the Workshop on Multilingual Resources, Technologies and Evaluation for Central and Eastern European Languages
September 2009
56 pages
Editors:
Elena Paskaleva
Bulgarian Academy of Sciences
,
Stelios Piperidis
ILSP, Greece
,
Milena Slavcheva
Bulgarian Academy of Sciences
,
Cristina Vertan
University of Hamburg
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 17 September 2009
Author Tags
aligned texts
e-dictionaries
proper names
query expansion
wordnets
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 79
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

E-connecting Balkan languages

MRTECEEL '09: Proceedings of the Workshop on Multilingual Resources, Technologies and Evaluation for Central and Eastern European Languages

ABSTRACT

References

Cited By

Index Terms

Recommendations

Proper nouns in English–Arabic cross language information retrieval

Exploiting aligned parallel corpora in multilingual studies and applications

RNN Language Model Estimation for Out-of-Vocabulary Words

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

E-connecting Balkan languages

MRTECEEL '09: Proceedings of the Workshop on Multilingual Resources, Technologies and Evaluation for Central and Eastern European Languages

ABSTRACT

References

Cited By

Index Terms

Recommendations

Proper nouns in English–Arabic cross language information retrieval

Exploiting aligned parallel corpora in multilingual studies and applications

RNN Language Model Estimation for Out-of-Vocabulary Words

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media