ABSTRACT
In this paper we present a versatile language processing tool that can be successfully used for many Balkan languages. For its work, this tool relies on several sophisticated textual and lexical resources that have been developed for most Balkan languages. These resources are based on several de facto standards in natural language processing.
- }}P. Bonhomme, T. M. H. Nguyen, S. O'Rourke. XAlign: l'aligneur de Langue & Dialogue, http://www.loria.fr/equipes/led/outils/ALIGN/align.html, 2001.Google Scholar
- }}B. Courtois, M. Silberztein (eds.). Dictionnaires électroniques du français. Langue française. 87, Larousse, Paris, 1990.Google Scholar
- }}D.-M. Dimitriu. Grammaires de flexion du roumain en format DELA, Rapport interne 2005--02 de l'Institut Gaspard-Monge, CNRS, 2005.Google Scholar
- }}T. Erjavec and N. Ide. The MULTEXT-East Corpus. In LREC'98, Granada, pp. 971--974, 1998.Google Scholar
- }}A. Gelbukh, G. Sidorov, J.-A. Vera-Félix. A Bilingual Corpus of Novels Aligned at Paragraph Level. In proc. FinTAL-2006. Lecture Notes in Artificial Intelligence, no. 4139, Springer-Verlag, pp. 16--23, 2006. Google ScholarDigital Library
- }}ISO 24610. Language resource management -- Feature Structures, ISO/TC 37/SC 4, 2005.Google Scholar
- }}S. Koeva. Modern language technologies -- applications and perspectives, in: Lows of/for language, Hejzal, Sofia, 2004, 111--157, 2004.Google Scholar
- }}C. Krstev, et al. Combining Heterogeneous Lexical Resources, in Proc. of the Fourth International Conference LREC, Lisbon, Portugal, May 2004, vol. 4, pp. 1103--1106, 2004.Google Scholar
- }}C. Krstev, R. Stanković, D. Vitas, I. Obradović. WS4LR: A Workstation for Lexical Resources, Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy, May 2006, pp. 1692--1697, 2006.Google Scholar
- }}C. Krstev, R. Stanković, D. Vitas, I. Obradović, The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines, in Proceedings of the Sixth Interantional Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco, 28--30 May 2008, European Language Resources Association (ELRA), 2008.Google Scholar
- }}C. Krstev. Processing of Serbian, Faculty of Phylology, University of Belgrade, Belgrade, 2008.Google Scholar
- }}T. Kyriacopoulou. Les dictionnaires électroniques: Morphologie et syntaxe. Le cas du grec moderne, Proceedings AILA 1990, Chalcidique, 1990.Google Scholar
- }}E. Laporte, T. Nakamura, S. Voyatzi. A French Corpus Annotated for Multiword Nouns, in: Towards a Shared Task for Multiword Expressions (MWE 2008), in scope of the Sixth Interantional Conference on Language Resources and Evaluation (LREC'08), http://multiword.sourceforge.net/download/MWE2008-papers/8_Laporte.pdf, 2008.Google Scholar
- }}D. Maurel, D. Vitas, C. Krstev, S. Koeva. Prolex: a lexical model for translation of proper names. Application to French, Serbian and Bulgarian, in Bulag - Bulletin de Linguistique Appliquée et Générale, Les langues slaves et le français: approches formelles dans les études contrastives, eds. A. Dziadkiewicz & I. Thomas, No. 32, pp. 55--72, Presses Universitaires de Franche Comtéé, Besançon, 2007.Google Scholar
- }}S. Paumier. Unitex 2.1 User Manual, http://www-igm.univ-mlv.fr/~unitex/UnitexManual2.1.pdf, 2008.Google Scholar
- }}O. Piton, D. Maurel. Beijing frowns and Washington takes notice: Computer Processing of Relations between Geographical Proper Names in Foreign Affairs, Fourth International Workshop on Applications of Natural Language to Data Bases (NLDB '00), Versailles, 28--30 juin (Actes p. 66--78), 2000. Google ScholarDigital Library
- }}R. Stanković. Improvement of Queries using a Rule Based Procedure for Inflection of Compounds and Phrases. Polibits (37) 2008, Special section: Natural Language Processing, Journal of Research and Development in Computer Science and Engineering, ed. Grigori Sidorov, Centro Innovación y Desarrollo Tecnológico en Computo, Instituto Politécnico Nacional, Mexico, pp. 14--20, 2008.Google Scholar
- }}R. Steinberger, B. Pouliquen, A. Widiger, C. Ignat, T. Erjavec, D. Tufiş. 2006. The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In Proceedings of the 5th LREC Conference, Genoa, Italy, 22--28 May, 2006, pp. 2142--2147, 2006.Google Scholar
- }}M. Tran, D. Maurel. Prolexbase: Un dictionnaire relationnel multilingue de noms propres, Traitement automatique des langues, Vol. 47--3, 2006.Google Scholar
- }}D. Tufiş (ed.). Special Issue on BalkaNet Project, Romanian Journal on Information Science and Technology. Bucureşti: Publishing house of the Romanian academy, Vol. 7, No. 1--2, 2004.Google Scholar
- }}D. Tufiş, S. Koeva, T. Erjavec, M. Gavrilidou, and C. Krstev. Building Language Resources and Translation Models for Machine Translation focused on South Slavic and Balkan Languages. In M. Tadić, M. Dimitrova-Vulchanova and S. Koeva (eds.) Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan Languages (FASSBL 2008), pp. 145--152, Dubrovnik, Croatia, September 25--28, 2008.Google Scholar
- }}P. Vossen (ed.) EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Dordrecht: Kluwer Academic Publishers, 1998. Google ScholarDigital Library
Index Terms
- E-connecting Balkan languages
Recommendations
Proper nouns in English–Arabic cross language information retrieval
Out of vocabulary words, mostly proper nouns and technical terms, are one main source of performance degradation in Cross Language Information Retrieval (CLIR) systems. Those are words not found in the dictionary. Bilingual dictionaries in general do ...
Exploiting aligned parallel corpora in multilingual studies and applications
IWIC'07: Proceedings of the 1st international conference on Intercultural collaborationParallel corpora encode extremely valuable linguistic knowledge, the revealing of which is facilitated by the recent advances in multilingual corpus linguistics. The linguistic decisions made by the human translators in order to faithfully convey the ...
RNN Language Model Estimation for Out-of-Vocabulary Words
Human Language Technology. Challenges for Computer Science and LinguisticsAbstractOne important issue of speech recognition systems is Out-of Vocabulary words (OOV). These words, often proper nouns or new words, are essential for documents to be transcribed correctly. Thus, they must be integrated in the language model (LM) and ...
Comments