research-article

Analyzing Constructional Change: Linguistic Annotation and Sources of Uncertainty

Authors:
Marie-Luis Merten

Department of German Studies and Comparative Literary Studies, Paderborn University, Germany

Department of German Studies and Comparative Literary Studies, Paderborn University, Germany
View Profile

,
Nina Seemann

Department of English and American Studies, Paderborn University, Germany

Department of English and American Studies, Paderborn University, Germany
View Profile

TEEM'18: Proceedings of the Sixth International Conference on Technological Ecosystems for Enhancing MulticulturalityOctober 2018Pages 819–825https://doi.org/10.1145/3284179.3284320

Published:24 October 2018Publication History

TEEM'18: Proceedings of the Sixth International Conference on Technological Ecosystems for Enhancing Multiculturality

Pages 819–825

ABSTRACT

This paper presents the various sources of uncertainty we encounter in our project. Our research focus lies on the investigation of language elaboration processes in Middle Low German. We are particularly interested in diachronic constructional changes and constructionalizations involving and affecting all linguistic dimensions. For this, it is necessary to annotate our corpus with Part-of-Speech and constructional tags. Here, we are confronted with gradualness, gradience, and ambiguity as potential sources of uncertainty that complicate the annotation process. Furthermore, due to the historicity of the investigated language, we expect cases of incomplete knowledge and comparative fallacy from the annotators. For this reason, we develop an interface that captures all annotators' doubts.

References

Bas Aarts. 2007. Syntactic Gradience. The Nature of Grammatical Indeterminacy. Cambridge University Press, New York.Google Scholar
Richard Bley-Vroman. 1983. The comparative fallacy in interlanguage studies: the case of systematicity. Language Learning 33, 1--17.Google ScholarCross Ref
M. Bollmann, F. Petran, S. Dipper, and J. Krasselt. 2014. CorA: A web-based annotation tool for historical and other non-standard language data. In Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH). Association for Computational Linguistics, Gothenburg, Sweden, 86--90.Google Scholar
Joan L. Bybee. 2010. Language, Usage and Cognition. Cambridge University Press, New York.Google Scholar
Joan L. Bybee. 2011. Usage-based theory and grammaticalization. In The Oxford Handbook of Grammaticalization, Heiko Narrog and Bernd Heine (Eds.). Oxford University Press, Oxford, New York, 60--78.Google Scholar
William A. Croft. 2001. Radical Construction Grammar: Syntactic Theory in Typological Perspective. Oxford University Press, Oxford.Google Scholar
David Denison. 2017. Ambiguity and Vagueness in Historical Change. In The Changing English Language. Psycholinguistic Perspectives, M. Hundt, S. Molling and S. E. Pfenniger (Eds.). Cambridge University Press, New York, 292--318.Google Scholar
S. Dipper, K. Donhauser, T. Klein, S. Linde, S. Müller, and K.-P. Wegera. 2013. HiTS: Ein Tagset für historische Sprachstufen des Deutschen. Journal for Language Technology and Computational Linguistics 28, 85--137.Google Scholar
Stefanie Dipper. 2015. Annotierte Korpora für die Historische Syntaxforschung: Anwendungsgebiete anhand des Referenzkorpus Mittelniederdeutsch. In Zeitschrift für Germanistische Linguistik 43.3, 516--563.Google ScholarCross Ref
Tomaž Erjavec. 2011. Automatic linguistic annotations of historical language: ToTrTaLe and XIX century Slovene. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities. Association for Computational Linguistics, Portland, OR, USA, 33--38. Google ScholarDigital Library
Tomaž Erjavec. 2105. The IMP historical Slovene language resources. In Language Resources and Evaluation 49. Springer, Netherlands, 753--775. Google ScholarDigital Library
Adele Goldberg. 1995. Constructions. A Construction Grammar Approach to Argument Structure. University of Chicago Press, Chicago.Google Scholar
Bernd Heine. 2002. On the role of context in grammaticalization. In New reflections on grammaticalization, Ilse Wischer and Gabriele Diewald (Eds). John Benjamins, Amsterdam, Philadelphia, 83--101.Google Scholar
Bernd Heine and Heiko Narrog. 2010. Grammaticalization and Linguistic Analysis. In The Oxford Handbook of Linguistic Analysis, Bernd Heine and Heiko Narrog (Eds.). Oxford University Press, New York, 401--423.Google Scholar
George Lakoff. 1987. Cognitive models and prototype theory. In Concepts and Conceptual Development, Ulric Neisser (Eds.). Cambridge University Press, Cambridge, 63--100.Google Scholar
Ronald W. Langacker. 1987. Foundations of Cognitive Grammar (I). Theoretical Prerequisites. Stanford University Press, Stanford.Google Scholar
Marie-Luis Merten. 2018. Literater Sprachausbau kognitiv-funktional. Funktionswort-Konstruktionen in der historischen Rechtsschriftlichkeit. de Gruyter, Berlin/Boston.Google Scholar
P. Rayson et al. 2007. Tagging the Bard: Evaluating the accuracy of a modern POS tagger on Early Modern English corpora. In Proceedings of Corpus Linguistics, University of Birmingham, UK, 1--14.Google Scholar
Eiríkur Rögnvaldsson and Sigrún Helgadóttir. 2008. Morphological tagging of Old Norse texts and its use in studying syntactic variation and change. In Proceedings of the LREC 2008 Workshop on Language Technology for Cultural Heritage Data (LaTeCH). European Language Resources Association, Marrakesh, Morocco, 40--46.Google Scholar
Eiríkur Rögnvaldsson and Sigrún Helgadóttir. 2011. Morphosyntactic Tagging of Old Icelandic Texts and Its Use in Studying Syntactic Variation and Change. In Language Technology for Cultural Heritage, C. Sporleder, A. van den Bosch, and K. Zervanou (Eds.). Springer, Berlin/Heidelberg, 63--76.Google Scholar
C. Sánchez-Marco, G. Boleda, J. M. Fontana, and J. Domingo. 2010. Annotation and representation of a diachronic corpus of Spanish. In Proceedings of the Seventh conference on International Conference on Language Resources and Evaluation (LREC). European Language Resources Association, Valletta, Malta, 2713--2718.Google Scholar
C. Sánchez-Marco, G. Boleda, and P. Lluís. 2011. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities. Association for Computational Linguistics, Portland, OR, USA, 1--9. Google ScholarDigital Library
Hans-Jörg Schmid. 2010. Does frequency in text instatiate entrenchment in the cognitive system? In Quantitative methods in cognitive semantics: Corpus-driven approaches, Dylan Glynn and Kerstin Fischer (Eds.). de Gruyter, Berlin, 101--133.Google Scholar
N. Seemann, M.-L. Merten, M. Geierhos, D. Tophinke, and E. Hüllermeier. 2017. Annotation Challenges for Reconstructing the Structural Elaboration of Middle Low German. In Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature. Association for Computational Linguistics, Vancouver, 40--45.Google Scholar
N. Seemann, M. Geierhos, M.-L. Merten, D. Tophinke, M. Wever, and E. Hüllermeier. 2018. Supporting the Cognitive Process in Annotation Tasks. In Postersession Computerlinguistik der 40. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft, K. Eckart and D. Schlechtweg (Eds.). Stuttgart.Google Scholar
John R. Taylor. 2003. Linguistic Categorization. Oxford University Press, New York.Google Scholar
Michael Tomasello. 2003. Constructing a Language. Harvard University Press, Cambridge, Massachusetts.Google Scholar
Doris Tophinke. 2009. Vom Vorlesetext zum Lesetext: Zur Syntax mittelniederdeutscher Rechtsverordnungen im Spätmittelalter. In Oberfläche und Performanz. Untersuchungen zur Sprache als dynamischer Gestalt, Angelika Linke and Helmuth Feilke (Eds.). Niemeyer, Tübingen, 161--183.Google Scholar
Elizabeth Traugott and Graeme Trousdale. 2010. Gradience, Gradualness and Grammaticalization: How do they intersect? In Gradience, Gradualness, and Grammaticalization, Elizabeth Traugott and Graeme Trousdale (Eds.). John Benjamins, Amsterdam, 19--44. Elizabeth Traugott and Graeme Trousdale. 2013. Constructionalization and Constructional Change. Oxford University Press, Oxford.Google Scholar
Alexander Ziem and Hans C. Boas. 2017. Towards a Construction for German. In Proceedings of the AAAI 2017 Spring Symposium on Computational Construction Grammar and Natural Language Understanding. Technical Report SS-17-02, Stanford University, 274--277.Google Scholar

Index Terms

Analyzing Constructional Change: Linguistic Annotation and Sources of Uncertainty
1. Human-centered computing
  1. Interaction design
    1. Interaction design process and methods
      1. Interface design prototyping

Recommendations

Disambiguation in the biomedical domain: The role of ambiguity type

Word Sense Disambiguation (WSD), the automatic identification of the meanings of ambiguous terms in a document, is an important stage in text processing. We describe a WSD system that has been developed specifically for the types of ambiguities found in ...
Read More
Determining the difficulty of Word Sense Disambiguation

We explore estimating WSD performance on a range of ambiguous biomedical terms.We evaluate the difficulty predictions against the output of two WSD systems.Supervised methods are the best predictors but limited by labeled training data.Unsupervised ...
Read More
Part-of-Speech (POS) Tagging Using Deep Learning-Based Approaches on the Designed Khasi POS Corpus
Part-of-speech (POS) tagging is one of the research challenging fields in natural language processing (NLP). It requires good knowledge of a particular language with large amounts of data or corpora for feature engineering, which can lead to achieving a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
TEEM'18: Proceedings of the Sixth International Conference on Technological Ecosystems for Enhancing Multiculturality
October 2018
1072 pages
ISBN:9781450365185
DOI:10.1145/3284179
Editor:
Francisco José García-Peñalvo
University of Salamanca
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Historical languages
ambiguity
gradience and gradualness
incomplete knowledge
linguistic annotations
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
TEEM'18 Paper Acceptance Rate151of243submissions,62%Overall Acceptance Rate496of705submissions,70%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 79
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Analyzing Constructional Change: Linguistic Annotation and Sources of Uncertainty

TEEM'18: Proceedings of the Sixth International Conference on Technological Ecosystems for Enhancing Multiculturality

ABSTRACT

References

Cited By

Index Terms

Recommendations

Disambiguation in the biomedical domain: The role of ambiguity type

Determining the difficulty of Word Sense Disambiguation

Part-of-Speech (POS) Tagging Using Deep Learning-Based Approaches on the Designed Khasi POS Corpus