ABSTRACT
In this paper, we present a corrected and error-tagged corpus of essays written by non-native speakers of English. The corpus contains 63000 words and includes data by learners of English of nine first language backgrounds. The annotation was performed at the sentence level and involved correcting all errors in the sentence. Error classification includes mistakes in preposition and article usage, errors in grammar, word order, and word choice. We show an analysis of errors in the annotated corpus by error categories and first language backgrounds, as well as inter-annotator agreement on the task.
We also describe a computer program that was developed to facilitate and standardize the annotation procedure for the task. The program allows for the annotation of various types of mistakes and was used in the annotation of the corpus.
- }}J. Bitchener, S. Young and D. Cameron. 2005. The Effect of Different Types of Corrective Feedback on ESL Student Writing. Journal of Second Language Writing.Google ScholarCross Ref
- }}A. J. Carlson and J. Rosen and D. Roth. 2001. Scaling Up Context Sensitive Text Correction. IAAI, 45--50. Google ScholarDigital Library
- }}M. Chodorow, J. Tetreault and N-R. Han. 2007. Detection of Grammatical Errors Involving Prepositions. Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions. Google ScholarDigital Library
- }}E. Dagneaux, S. Denness and S. Granger. 1998. Computer-aided Error Analysis. System, 26:163--174.Google ScholarCross Ref
- }}G. Dalgish. 1985. Computer-assisted ESL Research. CALICO Journal, 2(2).Google Scholar
- }}G. Dalgish. 1991. Computer-Assisted Error Analysis and Courseware Design: Applications for ESL in the Swedish Context. CALICO Journal, 9.Google Scholar
- }}R. De Felice and S. Pulman. 2008. A Classifier-Based Approach to Preposition and Determiner Error Correction in L2 English. In Proceedings of COLING-08. Google ScholarDigital Library
- }}A. Díaz-Negrillo and J. Fernández-Domíguez. 2006. Error Tagging Systems for Learner Corpora. RESLA, 19:83--102.Google Scholar
- }}J. Eeg-Olofsson and O. Knuttson. 2003. Automatic Grammar Checking for Second Language Learners - the Use of Prepositions. In Nodalida.Google Scholar
- }}M. Gamon, J. Gao, C. Brockett, A. Klementiev, W. Dolan, D. Belenko and L. Vanderwende. 2008. Using Contextual Speller Techniques and Language Modeling for ESL Error Correction. Proceedings of IJCNLP.Google Scholar
- }}A. R. Golding and D. Roth. 1996. Applying Winnow to Context-Sensitive Spelling Correction. ICML, 182--190.Google Scholar
- }}A. R. Golding and D. Roth. 1999. A Winnow based approach to Context-Sensitive Spelling Correction. Machine Learning, 34(1--3):107--130. Google ScholarDigital Library
- }}S. Granger, E. Dagneaux and F. Meunier. 2002. International Corpus of Learner EnglishGoogle Scholar
- }}S. Granger. 2002. A Bird's-eye View of Learner Corpus Research. Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching, Eds. S. Granger, J. Hung and S. Petch-Tyson, Amsterdam: John Benjamins. 3--33.Google Scholar
- }}S. Gui and H. Yang. 2003. Zhongguo Xuexizhe Yingyu Yuliaohu. (Chinese Learner English Corpus). Shanghai Waiyu Jiaoyu Chubanshe. (In Chinese).Google Scholar
- }}N. Han, M. Chodorow and C. Leacock. 2006. Detecting Errors in English Article Usage by Non-native Speakers. Journal of Natural Language Engineering, 12(2):115--129. Google ScholarDigital Library
- }}E. Izumi, K. Uchimoto, T. Saiga and H. Isahara. 2003. Automatic Error Detection in the Japanese Leaners English Spoken Data. ACL. Google ScholarDigital Library
- }}E. Izumi, K. Uchimoto and H. Isahara. 2004. The Overview of the SST Speech Corpus of Japanese Learner English and Evaluation through the Experiment on Automatic Detection of Learners' Errors. LREC.Google Scholar
- }}E. Izumi, K. Uchimoto and H. Isahara. 2004. The NICT JLE Corpus: Exploiting the Language Learner's Speech Database for Research and Education. International Journal of the Computer, the Internet and Management, 12(2):119--125.Google Scholar
- }}R. Nagata, A. Kawai, K. Morihiro, and N. Isu. 2006. A Feedback-Augmented Method for Detecting Errors in the Writing of Learners of English. ACL/COLING. Google ScholarDigital Library
- }}N. Pravec. 2002. Survey of learner corpora. ICAME Journal, 26:81--114.Google Scholar
- }}A. Rozovskaya and D. Roth 2010. Training Paradigms for Correcting Errors in Grammar and Usage. In Proceedings of the NAACL-HLT, Los-Angeles, CA. Google ScholarDigital Library
- }}J. Tetreault and M. Chodorow. 2008. Native Judgments of Non-Native Usage: Experiments in Preposition Error Detection. COLING Workshop on Human Judgments in Computational Linguistics, Manchester, UK. Google ScholarDigital Library
- }}J. Tetreault and M. Chodorow. 2008. The Ups and Downs of Preposition Error Detection in ESL Writing. COLING, Manchester, UK. Google ScholarDigital Library
Index Terms
- Annotating ESL errors: challenges and rewards
Recommendations
Annotating Chinese collocations with multi information
LAW '07: Proceedings of the Linguistic Annotation WorkshopThis paper presents the design and construction of an annotated Chinese collocation bank as the resource to support systematic research on Chinese collocations. With the help of computational tools, the bi-gram and n-gram collocations corresponding to 3,...
Annotating words using wordnet semantic glosses
ICONIP'12: Proceedings of the 19th international conference on Neural Information Processing - Volume Part IVAn approach to the word sense disambiguation (WSD) relaying on the WordNet synsets is proposed. The method uses semantically tagged glosses to perform a process similar to the spreading activation in semantic network, creating ranking of the most ...
Annotating sanskrit corpus: adapting IL-POSTS
LTC'09: Proceedings of the 4th conference on Human language technology: challenges for computer science and linguisticsIn this paper we present an experiment on the use of the hierarchical Indic Languages POS Tagset (IL-POSTS) (Baskaran et al 2008 a&b), developed by Microsoft Research India (MSRI) for tagging Indian languages, for annotating Sanskrit corpus. Sanskrit is ...
Comments