ABSTRACT
We apply machine learning techniques to study language transfer, a major topic in the theory of Second Language Acquisition (SLA). Using an SVM for the problem of native language classification, we show that a careful analysis of the effects of various features can lead to scientific insights. In particular, we demonstrate that character bigrams alone allow classification levels of about 66% for a 5-class task, even when content and function word differences are accounted for. This may show that native language has a strong effect on the word choice of people writing in a second language.
- Argamon S., Koppel M. and Shimoni A. 2003. Gender, Genre, and Writing Style in Formal Written Texts. Text 23(3).Google Scholar
- Bouselmi G., Fohr D., Illina, I., and Haton J. P. 2005. Fully Automated Non-Native Speech Recognition Using Confusion-Based Acoustic Model. Eurospeech/Interspeech '05.Google Scholar
- Bouselmi G., Fohr D., Illina I., and Haton J. P. 2006. Fully Automated Non-Native Speech Recognition Using Confusion-Based Acoustic Model Integration and Graphemic Constraints. IEEE International Conference on Acoustics, Speech and Signal Processing, 2006.Google Scholar
- Bybee J. 2006. Frequency of Use and the Organization of Language. Oxford University Press.Google Scholar
- Clark, E. 2003. First Language Acquisition. Cambridge University Press.Google Scholar
- Diederich J., Kindermann J., Leopold E. and Paass G. 2004. Authorship Attribution with Support Vector Machines. Applied Intelligence, 109--123. Google ScholarDigital Library
- Ellis N. 2002. Frequency Effects in Language Processing. Studies in Second Language Acquisition, 24(2):143--188.Google ScholarCross Ref
- Ellis R. 1999. Understanding Second Language Acquisition. Oxford University Press.Google Scholar
- Granger S., Dagneaux E. and Meunier F. 2002. International Corpus of Learner English. Presses universitaires de Louvain.Google Scholar
- Hansen J. H., Yapanel U., Huang, R. and Ikeno A. 2004. Dialect Analysis and Modeling for Automatic Classification. Interspeech-2004/ICSLP-2004: International Conference Spoken Language Processing. Jeju Island, South Korea.Google Scholar
- Holmes D. and Forsyth R. 1995. The Federalist Revisited: New Directions in Authorship Attribution. Literary and Linguistic Computing, pp. 111--127.Google Scholar
- James C. E. 1980. Contrastive Analysis. New York: Longman.Google Scholar
- Jusczyk P. W. 1997. The Discovery of Spoken Language. MIT Press.Google Scholar
- Koppel M. and Schler J. 2003. Exploiting Stylistic Idiosyncrasies for Authorship Attribution. In Proceedings of IJCAI '03 Workshop on Computational Approaches to Style Analysis and Synthesis. Acapulco, Mexico.Google Scholar
- Koppel M., Schler J. and Zigdon K. 2005(a). Determining an Author's Native Language by Mining a Text for Errors. Proceedings of KDD '05. Chicago IL. Google ScholarDigital Library
- Koppel M., Schler J. and Zigdon K. 2005(b). Automatically Determining an Anonymous Author's Native Language. In Intelligence and Security Informatics (pp. 209--217). Berlin / Heidelberg: Springer. Google ScholarDigital Library
- Odlin T. 1989. Language Transfer: Cross-Linguistic Influence in Language Learning. Cambridge University Press.Google ScholarCross Ref
- Porter F. M. 1980. An Algorithm for Suffix Stripping. Program, 14(3):130--137.Google ScholarCross Ref
- Saffran J. R. 2001. Words in a Sea of Sounds: The Output of Statistical Learning. Cognition, 81, 149--169.Google Scholar
- Saffran J. R. 2002. Constraints on Statistical Language Learning. Journal of Memory and Language, 47, 172--196.Google ScholarCross Ref
- Saffran J. R., Aslin R. N. and Newport E. N. 1996. Statistical Learning by 8-month Old Infants. Science, issue 5294, 1926--1928.Google Scholar
- Salton G. and Buckley C. 1988. Term Weighing Approaches in Automatic Text Retrieval. Information Processing and Management, 24(5):513--523. Google ScholarDigital Library
- Schölkopf B,. Smola A 2002. Learning with Kernels. MIT Press.Google Scholar
- Stamatatos E,. Fakotakis N. and Kokkinakis G. 2004. Computer-Based Authorship Attribution Without Lexical Measures. Computers and the Humanities, 193--214.Google Scholar
- Witten I. H. and Frank E. 2005. Data Mining: Practical Machine Learning Tools and Techniques. San Francisco: Morgan Kaufmann. Google ScholarDigital Library
- Yang C. 2004. Universal Grammar, Statistics, or Both?. Trends in Cognitive Science 8(10):451--456, 2004.Google ScholarCross Ref
- Using classifier features for studying the effect of native language on the choice of written second language words
Recommendations
Layout-sensitive language extensibility with SugarHaskell
Haskell '12: Proceedings of the 2012 Haskell SymposiumProgrammers need convenient syntax to write elegant and concise programs. Consequently, the Haskell standard provides syntactic sugar for some scenarios (e.g., do notation for monadic code), authors of Haskell compilers provide syntactic sugar for more ...
Creating and using domain-specific language features
GlobalDSL '13: Proceedings of the First Workshop on the Globalization of Domain Specific LanguagesThe value that domain-specific languages provide to their users is the domain-specific language features they contain. These features provide notations from the domain of interest, as well as domain-specific analysis and optimizations. But domain-...
Layout-sensitive language extensibility with SugarHaskell
Haskell '12Programmers need convenient syntax to write elegant and concise programs. Consequently, the Haskell standard provides syntactic sugar for some scenarios (e.g., do notation for monadic code), authors of Haskell compilers provide syntactic sugar for more ...
Comments