skip to main content
10.5555/1629795.1629797dlproceedingsArticle/Chapter ViewAbstractPublication PagescaclaConference Proceedingsconference-collections
research-article
Free Access

Using classifier features for studying the effect of native language on the choice of written second language words

Published:29 June 2007Publication History

ABSTRACT

We apply machine learning techniques to study language transfer, a major topic in the theory of Second Language Acquisition (SLA). Using an SVM for the problem of native language classification, we show that a careful analysis of the effects of various features can lead to scientific insights. In particular, we demonstrate that character bigrams alone allow classification levels of about 66% for a 5-class task, even when content and function word differences are accounted for. This may show that native language has a strong effect on the word choice of people writing in a second language.

References

  1. Argamon S., Koppel M. and Shimoni A. 2003. Gender, Genre, and Writing Style in Formal Written Texts. Text 23(3).Google ScholarGoogle Scholar
  2. Bouselmi G., Fohr D., Illina, I., and Haton J. P. 2005. Fully Automated Non-Native Speech Recognition Using Confusion-Based Acoustic Model. Eurospeech/Interspeech '05.Google ScholarGoogle Scholar
  3. Bouselmi G., Fohr D., Illina I., and Haton J. P. 2006. Fully Automated Non-Native Speech Recognition Using Confusion-Based Acoustic Model Integration and Graphemic Constraints. IEEE International Conference on Acoustics, Speech and Signal Processing, 2006.Google ScholarGoogle Scholar
  4. Bybee J. 2006. Frequency of Use and the Organization of Language. Oxford University Press.Google ScholarGoogle Scholar
  5. Clark, E. 2003. First Language Acquisition. Cambridge University Press.Google ScholarGoogle Scholar
  6. Diederich J., Kindermann J., Leopold E. and Paass G. 2004. Authorship Attribution with Support Vector Machines. Applied Intelligence, 109--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ellis N. 2002. Frequency Effects in Language Processing. Studies in Second Language Acquisition, 24(2):143--188.Google ScholarGoogle ScholarCross RefCross Ref
  8. Ellis R. 1999. Understanding Second Language Acquisition. Oxford University Press.Google ScholarGoogle Scholar
  9. Granger S., Dagneaux E. and Meunier F. 2002. International Corpus of Learner English. Presses universitaires de Louvain.Google ScholarGoogle Scholar
  10. Hansen J. H., Yapanel U., Huang, R. and Ikeno A. 2004. Dialect Analysis and Modeling for Automatic Classification. Interspeech-2004/ICSLP-2004: International Conference Spoken Language Processing. Jeju Island, South Korea.Google ScholarGoogle Scholar
  11. Holmes D. and Forsyth R. 1995. The Federalist Revisited: New Directions in Authorship Attribution. Literary and Linguistic Computing, pp. 111--127.Google ScholarGoogle Scholar
  12. James C. E. 1980. Contrastive Analysis. New York: Longman.Google ScholarGoogle Scholar
  13. Jusczyk P. W. 1997. The Discovery of Spoken Language. MIT Press.Google ScholarGoogle Scholar
  14. Koppel M. and Schler J. 2003. Exploiting Stylistic Idiosyncrasies for Authorship Attribution. In Proceedings of IJCAI '03 Workshop on Computational Approaches to Style Analysis and Synthesis. Acapulco, Mexico.Google ScholarGoogle Scholar
  15. Koppel M., Schler J. and Zigdon K. 2005(a). Determining an Author's Native Language by Mining a Text for Errors. Proceedings of KDD '05. Chicago IL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Koppel M., Schler J. and Zigdon K. 2005(b). Automatically Determining an Anonymous Author's Native Language. In Intelligence and Security Informatics (pp. 209--217). Berlin / Heidelberg: Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Odlin T. 1989. Language Transfer: Cross-Linguistic Influence in Language Learning. Cambridge University Press.Google ScholarGoogle ScholarCross RefCross Ref
  18. Porter F. M. 1980. An Algorithm for Suffix Stripping. Program, 14(3):130--137.Google ScholarGoogle ScholarCross RefCross Ref
  19. Saffran J. R. 2001. Words in a Sea of Sounds: The Output of Statistical Learning. Cognition, 81, 149--169.Google ScholarGoogle Scholar
  20. Saffran J. R. 2002. Constraints on Statistical Language Learning. Journal of Memory and Language, 47, 172--196.Google ScholarGoogle ScholarCross RefCross Ref
  21. Saffran J. R., Aslin R. N. and Newport E. N. 1996. Statistical Learning by 8-month Old Infants. Science, issue 5294, 1926--1928.Google ScholarGoogle Scholar
  22. Salton G. and Buckley C. 1988. Term Weighing Approaches in Automatic Text Retrieval. Information Processing and Management, 24(5):513--523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Schölkopf B,. Smola A 2002. Learning with Kernels. MIT Press.Google ScholarGoogle Scholar
  24. Stamatatos E,. Fakotakis N. and Kokkinakis G. 2004. Computer-Based Authorship Attribution Without Lexical Measures. Computers and the Humanities, 193--214.Google ScholarGoogle Scholar
  25. Witten I. H. and Frank E. 2005. Data Mining: Practical Machine Learning Tools and Techniques. San Francisco: Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yang C. 2004. Universal Grammar, Statistics, or Both?. Trends in Cognitive Science 8(10):451--456, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  1. Using classifier features for studying the effect of native language on the choice of written second language words

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        CACLA '07: Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition
        June 2007
        108 pages

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 29 June 2007

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader