Abstract
Our article explores the possibilities of using deeply annotated, incrementally evolving comparable corpora for the study of language change, in this case for different stages from Old High German to New High German. Using the example of the evolution of German past tenses, we show how a variety of categories ranging from low to high complexity interact with the choice between competing linguistic variants. To adequately explore the influence of these categories, we use a multilayer corpus architecture that develops together with our study. We show that a combination of quantitative and qualitative analyses can recognize relevant contextual factors, which feed into the addition of new annotation layers applying to the same data. By making our categorizations explicit as corpus annotations and our data available to other researchers, we promote an open, extensible, and transparent mode of research, where both raw data and the inferential process are exposed to other researchers.
- Albert, S., Anderssen, J., Bader, R., Becker, S. Bracht, T., Brants, S. 2003. Tiger Annotationsschema. Tech. rep. Universität Potsdam, Universität des Saarlandes, Universität Stuttgart. (http://www.ifi.uzh.ch/cl/volk/treebank_course/tiger_annot. pdf).Google Scholar
- Biber, D. and Jones, J. 2009. Quantitative methods in corpus linguistics. In Lüdeling, A. and Kytö, M. (Eds.) Corpus Linguistics. An International Handbook. Vol. 2, Mouton de Gruyter. Berlin, 1286--1304.Google Scholar
- Bamman, D. and Crane, G. 2006. The design and use of a latin dependency treebank. In Proceedings of the 5th International Workshop on Treebanks and Linguistic Theories (TLT '06). 67--78.Google Scholar
- Bamman, D., Mambrini, F., and Crane, G. 2009. An ownership model of annotation: The ancient greek dependency treebank. In Proceedings of the 8th International Workshop on Treebanks and Linguistic Theories (TLT '08).Google Scholar
- Brants, S., Dipper, S., Hansen, S., Lezius, W., and Smith, G. 2002. The TIGER treebank. In Proceedings of the International Workshop on Treebanks and Linguistic Theories (TLT-02).Google Scholar
- Carletta, J., Evert, S., Heid, U., Kilgour, J., Robertson, J. and Voormann, H. 2003. The NITE XML toolkit: Flexible annotation for multi-modal language data. Behav. Res. Methods, Instruments, Comput. 35, 3, 353--363.Google ScholarCross Ref
- Comrie, B. 1976. Aspect. An Introduction to the Study of Verbal Aspect and Related Problems. Cambridge University Press, Cambridge, MA.Google Scholar
- Crane, G. 1998. The Perseus project and beyond: How building a digital library challenges the humanities and technology. D-Lib Mag. Google ScholarDigital Library
- Crysmann, B., Hansen-Schirra, S., Smith, G. and Ziegler-Eisele, D. 2005. TIGER Morphologie-Annotationsschema. Tech. rep., Universität Potsdam, Universität Saarbrücken.Google Scholar
- Demske, U., Frank, N., Laufer, S. and Stiemer, H. 2004. Syntactic interpretation of an Early New High German corpus. In Proceedings of the 3rd Workshop on Treebanks and Linguistic Theories (TLT '04). 175--182.Google Scholar
- Dentler, S. 1997. Zur Perfekterneuerung im Mittelhochdeutschen. Die Erweiterung des zeitreferentiellen Funktionsbereichs von Perfektfügungen. Göteborg: Acta Universitatis Gothoburgensis.Google Scholar
- Diel, M., Fisseni, B., Lenders, W. and Schmitz, H.-C. 2002. XML-Kodierung des Bonner Frühneuhochdeutschkorpus. Tech. rep. Bonn University.Google Scholar
- Dipper, S. 2005. XML-Based stand-off representation and exploitation of multi-level linguistic annotation. In Proceedings of Berliner XML Tage 2005 (BXML'05). 39--50.Google Scholar
- Grønvik, O. 1986. Über den Ursprung und die Entwicklung der aktiven Perfekt- und Plusquamperfektkonstruktionen des Hochdeutschen und ihre Eigenart innerhalb des germanischen Sprachraumes. Solum Forlag, Osio.Google Scholar
- Helbig, G. and Buscha, J. 2001. Deutsche grammatik. Ein Handbuch für den Ausländerunterricht. Berlin et al.: Langenscheidt.Google Scholar
- Hilpert, M. 2008. Germanic future constructions. A Usage-Based Approach to Language Change. John Benjamins, Amsterdam, Philadelphia, PA.Google Scholar
- Hirschmann, H. and Linde, S. 2011. Annotationsguidelines zur Deutschen Diachronen Baumbank. Tech. rep. Humboldt-Universität zu Berlin.Google Scholar
- Kroch, A. 1989. Reflexes of grammar in patterns of language change. In Language Variation and Change 1, 199--244.Google ScholarCross Ref
- Kroch, A., Santorini, B., and Delfs, L. (Eds.) 2004. The Penn-Helsinki parsed corpus of Early Modern English. Tech. rep., University of Pennsylvania, Philadelphia: Department of Linguistics.Google Scholar
- Kytö, M. 1991. Manual to the diachronic part of the Helsinki corpus of english texts: Coding conventions and lists of source texts. Tech. rep., Department of English, University of Helsinki.Google Scholar
- Leiss, E. 1992. Die Verbalkategorien des Deutschen. Ein Beitrag zur Theorie der sprachlichen Kategorisierung. Mouton de Gruyter, Berlin (Studia linguistica Germanica, 31).Google Scholar
- Lezius, W., Biesinger, H., and Gerstenberger, C. 2002. TIGER-XML quick reference guide. Tech. Rep., IMS, University of Stuttgart.Google Scholar
- Lüdeling, A., Hirschmann, H., and Zeldes, A. 2012. Variationism and underuse statistics in the analysis of the development of relative clauses in German. In Kawaguchi, Y., Minegishi, M., and Viereck, W. (Eds.) Corpus Analysis and Diachronic Linguistics. John Benjamins, Amsterdam.Google Scholar
- McEnery, T. and Wilson, A. 2001. Corpus Linguistics. 2nd ed. Edinburgh University Press, Edinburgh, UK.Google Scholar
- Moya, I. G. 2010. Eine variationistische Analyse der Entstehung und Entwicklung des deutschen haben-Perfekts. Bachelor Thesis. Humboldt-Universität zu Berlin.Google Scholar
- Musan, R. 2002. The German Perfect. Kluwer Academic Publishers, Dordrecht, The Netherlands.Google Scholar
- Nübling, D. 2006. Historische Sprachwissenschaft des Deutschen. Eine Einführung in die Prinzipien des Sprachwandels. In cooperation with Dammel, A., Duke, J., and Szczepaniak, R. Narr, TübingenGoogle Scholar
- Petrova, S., Solf, M., Ritz, J., Chiarcos, C., and Zeldes, A. 2009. Building and using a richly annotated interlinear diachronic corpus: The case of Old High German Tatian. Traitement Automatique des Langues 50, 2, 47--71.Google Scholar
- Reichmann, O. and Wegera, K.-P. (Eds.) 1993. Frühneuhochdeutsche Grammatik. Niemeyer, Tübingen, Germany.Google Scholar
- Resnik, P., Olsen, M. B., and Diab, M. 1999. The Bible as a parallel corpus: Annotating the “book of 2000 tongues”. Comput. Humanities 33, 129--153.Google ScholarCross Ref
- Rissanen, M. 2008. Corpus linguistics and historical linguistics. In Lüdeling, A. and Kytö, M. (Eds.) Corpus Linguistics. An International Handbook. Vol. 1. Mouton de Gruyter, Berlin, 53--68.Google Scholar
- Schmid, H. 1994. Probabilistic part-of-speech tagging using decision trees. In Proceedings of the Conference on New Methods in Language Processing.Google Scholar
- Witten, I. H. and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques, 2nd Ed. Morgan Kaufman, San Francisco. Google ScholarDigital Library
- Zeldes, A., Ritz, J., Lüdeling, A., and Chiarcos, C. 2009. ANNIS: A search tool for multi-layer annotated corpora. In Proceedings of Corpus Linguistics 09.Google Scholar
- Zipser F. 2009. Entwicklung eines Konverterframeworks für linguistisch annotierte Daten auf Basis eines gemeinsamen (Meta-)Modells. Master thesis, Humboldt-Universität zu Berlin, (https://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/mitarbeiter-innen/florian/pdf/diplomarbeit.pdf)Google Scholar
- Zipser F. and Romary L. 2010. A model oriented approach to the mapping of annotation formats using standards. In Proceedings of the Workshop on Language Resource and Language Technology Standards, (LREC'10).Google Scholar
Index Terms
- Measuring and coding language change: An evolving study in a multilayer corpus architecture
Recommendations
Towards a Lexicographic Approach to Lexical Transfer in Machine Translation (Illustrated by the German–Russian Language Pair)
The translation of lexical items is still a formidable obstacle in the field of Machine Translation. The present article addresses this problem from the perspective of modern lexicography, putting forth detailed monolingual lexica which contain highly ...
Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval
Objectives:: We present in this article experiments on multi-language information extraction and access in the medical domain. For such applications, multilingual terminology plays a crucial role when working on specialized languages and specific ...
Light stemming approaches for the French, Portuguese, German and Hungarian languages
SAC '06: Proceedings of the 2006 ACM symposium on Applied computingThis paper describes and evaluates various general stemming approaches for the French, Portuguese (Brazilian), German and Hungarian languages. Based on the CLEF test-collections, we demonstrate that light stemmers for the French, Portuguese and ...
Comments