ABSTRACT
This paper presents the various sources of uncertainty we encounter in our project. Our research focus lies on the investigation of language elaboration processes in Middle Low German. We are particularly interested in diachronic constructional changes and constructionalizations involving and affecting all linguistic dimensions. For this, it is necessary to annotate our corpus with Part-of-Speech and constructional tags. Here, we are confronted with gradualness, gradience, and ambiguity as potential sources of uncertainty that complicate the annotation process. Furthermore, due to the historicity of the investigated language, we expect cases of incomplete knowledge and comparative fallacy from the annotators. For this reason, we develop an interface that captures all annotators' doubts.
- Bas Aarts. 2007. Syntactic Gradience. The Nature of Grammatical Indeterminacy. Cambridge University Press, New York.Google Scholar
- Richard Bley-Vroman. 1983. The comparative fallacy in interlanguage studies: the case of systematicity. Language Learning 33, 1--17.Google ScholarCross Ref
- M. Bollmann, F. Petran, S. Dipper, and J. Krasselt. 2014. CorA: A web-based annotation tool for historical and other non-standard language data. In Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH). Association for Computational Linguistics, Gothenburg, Sweden, 86--90.Google Scholar
- Joan L. Bybee. 2010. Language, Usage and Cognition. Cambridge University Press, New York.Google Scholar
- Joan L. Bybee. 2011. Usage-based theory and grammaticalization. In The Oxford Handbook of Grammaticalization, Heiko Narrog and Bernd Heine (Eds.). Oxford University Press, Oxford, New York, 60--78.Google Scholar
- William A. Croft. 2001. Radical Construction Grammar: Syntactic Theory in Typological Perspective. Oxford University Press, Oxford.Google Scholar
- David Denison. 2017. Ambiguity and Vagueness in Historical Change. In The Changing English Language. Psycholinguistic Perspectives, M. Hundt, S. Molling and S. E. Pfenniger (Eds.). Cambridge University Press, New York, 292--318.Google Scholar
- S. Dipper, K. Donhauser, T. Klein, S. Linde, S. Müller, and K.-P. Wegera. 2013. HiTS: Ein Tagset für historische Sprachstufen des Deutschen. Journal for Language Technology and Computational Linguistics 28, 85--137.Google Scholar
- Stefanie Dipper. 2015. Annotierte Korpora für die Historische Syntaxforschung: Anwendungsgebiete anhand des Referenzkorpus Mittelniederdeutsch. In Zeitschrift für Germanistische Linguistik 43.3, 516--563.Google ScholarCross Ref
- Tomaž Erjavec. 2011. Automatic linguistic annotations of historical language: ToTrTaLe and XIX century Slovene. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities. Association for Computational Linguistics, Portland, OR, USA, 33--38. Google ScholarDigital Library
- Tomaž Erjavec. 2105. The IMP historical Slovene language resources. In Language Resources and Evaluation 49. Springer, Netherlands, 753--775. Google ScholarDigital Library
- Adele Goldberg. 1995. Constructions. A Construction Grammar Approach to Argument Structure. University of Chicago Press, Chicago.Google Scholar
- Bernd Heine. 2002. On the role of context in grammaticalization. In New reflections on grammaticalization, Ilse Wischer and Gabriele Diewald (Eds). John Benjamins, Amsterdam, Philadelphia, 83--101.Google Scholar
- Bernd Heine and Heiko Narrog. 2010. Grammaticalization and Linguistic Analysis. In The Oxford Handbook of Linguistic Analysis, Bernd Heine and Heiko Narrog (Eds.). Oxford University Press, New York, 401--423.Google Scholar
- George Lakoff. 1987. Cognitive models and prototype theory. In Concepts and Conceptual Development, Ulric Neisser (Eds.). Cambridge University Press, Cambridge, 63--100.Google Scholar
- Ronald W. Langacker. 1987. Foundations of Cognitive Grammar (I). Theoretical Prerequisites. Stanford University Press, Stanford.Google Scholar
- Marie-Luis Merten. 2018. Literater Sprachausbau kognitiv-funktional. Funktionswort-Konstruktionen in der historischen Rechtsschriftlichkeit. de Gruyter, Berlin/Boston.Google Scholar
- P. Rayson et al. 2007. Tagging the Bard: Evaluating the accuracy of a modern POS tagger on Early Modern English corpora. In Proceedings of Corpus Linguistics, University of Birmingham, UK, 1--14.Google Scholar
- Eiríkur Rögnvaldsson and Sigrún Helgadóttir. 2008. Morphological tagging of Old Norse texts and its use in studying syntactic variation and change. In Proceedings of the LREC 2008 Workshop on Language Technology for Cultural Heritage Data (LaTeCH). European Language Resources Association, Marrakesh, Morocco, 40--46.Google Scholar
- Eiríkur Rögnvaldsson and Sigrún Helgadóttir. 2011. Morphosyntactic Tagging of Old Icelandic Texts and Its Use in Studying Syntactic Variation and Change. In Language Technology for Cultural Heritage, C. Sporleder, A. van den Bosch, and K. Zervanou (Eds.). Springer, Berlin/Heidelberg, 63--76.Google Scholar
- C. Sánchez-Marco, G. Boleda, J. M. Fontana, and J. Domingo. 2010. Annotation and representation of a diachronic corpus of Spanish. In Proceedings of the Seventh conference on International Conference on Language Resources and Evaluation (LREC). European Language Resources Association, Valletta, Malta, 2713--2718.Google Scholar
- C. Sánchez-Marco, G. Boleda, and P. Lluís. 2011. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities. Association for Computational Linguistics, Portland, OR, USA, 1--9. Google ScholarDigital Library
- Hans-Jörg Schmid. 2010. Does frequency in text instatiate entrenchment in the cognitive system? In Quantitative methods in cognitive semantics: Corpus-driven approaches, Dylan Glynn and Kerstin Fischer (Eds.). de Gruyter, Berlin, 101--133.Google Scholar
- N. Seemann, M.-L. Merten, M. Geierhos, D. Tophinke, and E. Hüllermeier. 2017. Annotation Challenges for Reconstructing the Structural Elaboration of Middle Low German. In Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature. Association for Computational Linguistics, Vancouver, 40--45.Google Scholar
- N. Seemann, M. Geierhos, M.-L. Merten, D. Tophinke, M. Wever, and E. Hüllermeier. 2018. Supporting the Cognitive Process in Annotation Tasks. In Postersession Computerlinguistik der 40. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft, K. Eckart and D. Schlechtweg (Eds.). Stuttgart.Google Scholar
- John R. Taylor. 2003. Linguistic Categorization. Oxford University Press, New York.Google Scholar
- Michael Tomasello. 2003. Constructing a Language. Harvard University Press, Cambridge, Massachusetts.Google Scholar
- Doris Tophinke. 2009. Vom Vorlesetext zum Lesetext: Zur Syntax mittelniederdeutscher Rechtsverordnungen im Spätmittelalter. In Oberfläche und Performanz. Untersuchungen zur Sprache als dynamischer Gestalt, Angelika Linke and Helmuth Feilke (Eds.). Niemeyer, Tübingen, 161--183.Google Scholar
- Elizabeth Traugott and Graeme Trousdale. 2010. Gradience, Gradualness and Grammaticalization: How do they intersect? In Gradience, Gradualness, and Grammaticalization, Elizabeth Traugott and Graeme Trousdale (Eds.). John Benjamins, Amsterdam, 19--44. Elizabeth Traugott and Graeme Trousdale. 2013. Constructionalization and Constructional Change. Oxford University Press, Oxford.Google Scholar
- Alexander Ziem and Hans C. Boas. 2017. Towards a Construction for German. In Proceedings of the AAAI 2017 Spring Symposium on Computational Construction Grammar and Natural Language Understanding. Technical Report SS-17-02, Stanford University, 274--277.Google Scholar
Index Terms
- Analyzing Constructional Change: Linguistic Annotation and Sources of Uncertainty
Recommendations
Disambiguation in the biomedical domain: The role of ambiguity type
Word Sense Disambiguation (WSD), the automatic identification of the meanings of ambiguous terms in a document, is an important stage in text processing. We describe a WSD system that has been developed specifically for the types of ambiguities found in ...
Determining the difficulty of Word Sense Disambiguation
We explore estimating WSD performance on a range of ambiguous biomedical terms.We evaluate the difficulty predictions against the output of two WSD systems.Supervised methods are the best predictors but limited by labeled training data.Unsupervised ...
Part-of-Speech (POS) Tagging Using Deep Learning-Based Approaches on the Designed Khasi POS Corpus
Part-of-speech (POS) tagging is one of the research challenging fields in natural language processing (NLP). It requires good knowledge of a particular language with large amounts of data or corpora for feature engineering, which can lead to achieving a ...
Comments