skip to main content
research-article

Measuring and coding language change: An evolving study in a multilayer corpus architecture

Published:27 April 2012Publication History
Skip Abstract Section

Abstract

Our article explores the possibilities of using deeply annotated, incrementally evolving comparable corpora for the study of language change, in this case for different stages from Old High German to New High German. Using the example of the evolution of German past tenses, we show how a variety of categories ranging from low to high complexity interact with the choice between competing linguistic variants. To adequately explore the influence of these categories, we use a multilayer corpus architecture that develops together with our study. We show that a combination of quantitative and qualitative analyses can recognize relevant contextual factors, which feed into the addition of new annotation layers applying to the same data. By making our categorizations explicit as corpus annotations and our data available to other researchers, we promote an open, extensible, and transparent mode of research, where both raw data and the inferential process are exposed to other researchers.

References

  1. Albert, S., Anderssen, J., Bader, R., Becker, S. Bracht, T., Brants, S. 2003. Tiger Annotationsschema. Tech. rep. Universität Potsdam, Universität des Saarlandes, Universität Stuttgart. (http://www.ifi.uzh.ch/cl/volk/treebank_course/tiger_annot. pdf).Google ScholarGoogle Scholar
  2. Biber, D. and Jones, J. 2009. Quantitative methods in corpus linguistics. In Lüdeling, A. and Kytö, M. (Eds.) Corpus Linguistics. An International Handbook. Vol. 2, Mouton de Gruyter. Berlin, 1286--1304.Google ScholarGoogle Scholar
  3. Bamman, D. and Crane, G. 2006. The design and use of a latin dependency treebank. In Proceedings of the 5th International Workshop on Treebanks and Linguistic Theories (TLT '06). 67--78.Google ScholarGoogle Scholar
  4. Bamman, D., Mambrini, F., and Crane, G. 2009. An ownership model of annotation: The ancient greek dependency treebank. In Proceedings of the 8th International Workshop on Treebanks and Linguistic Theories (TLT '08).Google ScholarGoogle Scholar
  5. Brants, S., Dipper, S., Hansen, S., Lezius, W., and Smith, G. 2002. The TIGER treebank. In Proceedings of the International Workshop on Treebanks and Linguistic Theories (TLT-02).Google ScholarGoogle Scholar
  6. Carletta, J., Evert, S., Heid, U., Kilgour, J., Robertson, J. and Voormann, H. 2003. The NITE XML toolkit: Flexible annotation for multi-modal language data. Behav. Res. Methods, Instruments, Comput. 35, 3, 353--363.Google ScholarGoogle ScholarCross RefCross Ref
  7. Comrie, B. 1976. Aspect. An Introduction to the Study of Verbal Aspect and Related Problems. Cambridge University Press, Cambridge, MA.Google ScholarGoogle Scholar
  8. Crane, G. 1998. The Perseus project and beyond: How building a digital library challenges the humanities and technology. D-Lib Mag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Crysmann, B., Hansen-Schirra, S., Smith, G. and Ziegler-Eisele, D. 2005. TIGER Morphologie-Annotationsschema. Tech. rep., Universität Potsdam, Universität Saarbrücken.Google ScholarGoogle Scholar
  10. Demske, U., Frank, N., Laufer, S. and Stiemer, H. 2004. Syntactic interpretation of an Early New High German corpus. In Proceedings of the 3rd Workshop on Treebanks and Linguistic Theories (TLT '04). 175--182.Google ScholarGoogle Scholar
  11. Dentler, S. 1997. Zur Perfekterneuerung im Mittelhochdeutschen. Die Erweiterung des zeitreferentiellen Funktionsbereichs von Perfektfügungen. Göteborg: Acta Universitatis Gothoburgensis.Google ScholarGoogle Scholar
  12. Diel, M., Fisseni, B., Lenders, W. and Schmitz, H.-C. 2002. XML-Kodierung des Bonner Frühneuhochdeutschkorpus. Tech. rep. Bonn University.Google ScholarGoogle Scholar
  13. Dipper, S. 2005. XML-Based stand-off representation and exploitation of multi-level linguistic annotation. In Proceedings of Berliner XML Tage 2005 (BXML'05). 39--50.Google ScholarGoogle Scholar
  14. Grønvik, O. 1986. Über den Ursprung und die Entwicklung der aktiven Perfekt- und Plusquamperfektkonstruktionen des Hochdeutschen und ihre Eigenart innerhalb des germanischen Sprachraumes. Solum Forlag, Osio.Google ScholarGoogle Scholar
  15. Helbig, G. and Buscha, J. 2001. Deutsche grammatik. Ein Handbuch für den Ausländerunterricht. Berlin et al.: Langenscheidt.Google ScholarGoogle Scholar
  16. Hilpert, M. 2008. Germanic future constructions. A Usage-Based Approach to Language Change. John Benjamins, Amsterdam, Philadelphia, PA.Google ScholarGoogle Scholar
  17. Hirschmann, H. and Linde, S. 2011. Annotationsguidelines zur Deutschen Diachronen Baumbank. Tech. rep. Humboldt-Universität zu Berlin.Google ScholarGoogle Scholar
  18. Kroch, A. 1989. Reflexes of grammar in patterns of language change. In Language Variation and Change 1, 199--244.Google ScholarGoogle ScholarCross RefCross Ref
  19. Kroch, A., Santorini, B., and Delfs, L. (Eds.) 2004. The Penn-Helsinki parsed corpus of Early Modern English. Tech. rep., University of Pennsylvania, Philadelphia: Department of Linguistics.Google ScholarGoogle Scholar
  20. Kytö, M. 1991. Manual to the diachronic part of the Helsinki corpus of english texts: Coding conventions and lists of source texts. Tech. rep., Department of English, University of Helsinki.Google ScholarGoogle Scholar
  21. Leiss, E. 1992. Die Verbalkategorien des Deutschen. Ein Beitrag zur Theorie der sprachlichen Kategorisierung. Mouton de Gruyter, Berlin (Studia linguistica Germanica, 31).Google ScholarGoogle Scholar
  22. Lezius, W., Biesinger, H., and Gerstenberger, C. 2002. TIGER-XML quick reference guide. Tech. Rep., IMS, University of Stuttgart.Google ScholarGoogle Scholar
  23. Lüdeling, A., Hirschmann, H., and Zeldes, A. 2012. Variationism and underuse statistics in the analysis of the development of relative clauses in German. In Kawaguchi, Y., Minegishi, M., and Viereck, W. (Eds.) Corpus Analysis and Diachronic Linguistics. John Benjamins, Amsterdam.Google ScholarGoogle Scholar
  24. McEnery, T. and Wilson, A. 2001. Corpus Linguistics. 2nd ed. Edinburgh University Press, Edinburgh, UK.Google ScholarGoogle Scholar
  25. Moya, I. G. 2010. Eine variationistische Analyse der Entstehung und Entwicklung des deutschen haben-Perfekts. Bachelor Thesis. Humboldt-Universität zu Berlin.Google ScholarGoogle Scholar
  26. Musan, R. 2002. The German Perfect. Kluwer Academic Publishers, Dordrecht, The Netherlands.Google ScholarGoogle Scholar
  27. Nübling, D. 2006. Historische Sprachwissenschaft des Deutschen. Eine Einführung in die Prinzipien des Sprachwandels. In cooperation with Dammel, A., Duke, J., and Szczepaniak, R. Narr, TübingenGoogle ScholarGoogle Scholar
  28. Petrova, S., Solf, M., Ritz, J., Chiarcos, C., and Zeldes, A. 2009. Building and using a richly annotated interlinear diachronic corpus: The case of Old High German Tatian. Traitement Automatique des Langues 50, 2, 47--71.Google ScholarGoogle Scholar
  29. Reichmann, O. and Wegera, K.-P. (Eds.) 1993. Frühneuhochdeutsche Grammatik. Niemeyer, Tübingen, Germany.Google ScholarGoogle Scholar
  30. Resnik, P., Olsen, M. B., and Diab, M. 1999. The Bible as a parallel corpus: Annotating the “book of 2000 tongues”. Comput. Humanities 33, 129--153.Google ScholarGoogle ScholarCross RefCross Ref
  31. Rissanen, M. 2008. Corpus linguistics and historical linguistics. In Lüdeling, A. and Kytö, M. (Eds.) Corpus Linguistics. An International Handbook. Vol. 1. Mouton de Gruyter, Berlin, 53--68.Google ScholarGoogle Scholar
  32. Schmid, H. 1994. Probabilistic part-of-speech tagging using decision trees. In Proceedings of the Conference on New Methods in Language Processing.Google ScholarGoogle Scholar
  33. Witten, I. H. and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques, 2nd Ed. Morgan Kaufman, San Francisco. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Zeldes, A., Ritz, J., Lüdeling, A., and Chiarcos, C. 2009. ANNIS: A search tool for multi-layer annotated corpora. In Proceedings of Corpus Linguistics 09.Google ScholarGoogle Scholar
  35. Zipser F. 2009. Entwicklung eines Konverterframeworks für linguistisch annotierte Daten auf Basis eines gemeinsamen (Meta-)Modells. Master thesis, Humboldt-Universität zu Berlin, (https://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/mitarbeiter-innen/florian/pdf/diplomarbeit.pdf)Google ScholarGoogle Scholar
  36. Zipser F. and Romary L. 2010. A model oriented approach to the mapping of annotation formats using standards. In Proceedings of the Workshop on Language Resource and Language Technology Standards, (LREC'10).Google ScholarGoogle Scholar

Index Terms

  1. Measuring and coding language change: An evolving study in a multilayer corpus architecture

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Journal on Computing and Cultural Heritage
      Journal on Computing and Cultural Heritage   Volume 5, Issue 1
      April 2012
      53 pages
      ISSN:1556-4673
      EISSN:1556-4711
      DOI:10.1145/2160165
      Issue’s Table of Contents

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 April 2012
      • Accepted: 1 June 2011
      • Received: 1 February 2011
      Published in jocch Volume 5, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)1

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader