research-article

Free Access

Using classifier features for studying the effect of native language on the choice of written second language words

Authors:
Oren Tsur

The Hebrew University, Jerusalem, Israel

The Hebrew University, Jerusalem, Israel
View Profile

,
Ari Rappoport

The Hebrew University, Jerusalem, Israel

The Hebrew University, Jerusalem, Israel
View Profile

Authors Info & Claims

CACLA '07: Proceedings of the Workshop on Cognitive Aspects of Computational Language AcquisitionJune 2007Pages 9–16

Published:29 June 2007Publication History

CACLA '07: Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition

Pages 9–16

ABSTRACT

We apply machine learning techniques to study language transfer, a major topic in the theory of Second Language Acquisition (SLA). Using an SVM for the problem of native language classification, we show that a careful analysis of the effects of various features can lead to scientific insights. In particular, we demonstrate that character bigrams alone allow classification levels of about 66% for a 5-class task, even when content and function word differences are accounted for. This may show that native language has a strong effect on the word choice of people writing in a second language.

References

Argamon S., Koppel M. and Shimoni A. 2003. Gender, Genre, and Writing Style in Formal Written Texts. Text 23(3).Google Scholar
Bouselmi G., Fohr D., Illina, I., and Haton J. P. 2005. Fully Automated Non-Native Speech Recognition Using Confusion-Based Acoustic Model. Eurospeech/Interspeech '05.Google Scholar
Bouselmi G., Fohr D., Illina I., and Haton J. P. 2006. Fully Automated Non-Native Speech Recognition Using Confusion-Based Acoustic Model Integration and Graphemic Constraints. IEEE International Conference on Acoustics, Speech and Signal Processing, 2006.Google Scholar
Bybee J. 2006. Frequency of Use and the Organization of Language. Oxford University Press.Google Scholar
Clark, E. 2003. First Language Acquisition. Cambridge University Press.Google Scholar
Diederich J., Kindermann J., Leopold E. and Paass G. 2004. Authorship Attribution with Support Vector Machines. Applied Intelligence, 109--123. Google ScholarDigital Library
Ellis N. 2002. Frequency Effects in Language Processing. Studies in Second Language Acquisition, 24(2):143--188.Google ScholarCross Ref
Ellis R. 1999. Understanding Second Language Acquisition. Oxford University Press.Google Scholar
Granger S., Dagneaux E. and Meunier F. 2002. International Corpus of Learner English. Presses universitaires de Louvain.Google Scholar
Hansen J. H., Yapanel U., Huang, R. and Ikeno A. 2004. Dialect Analysis and Modeling for Automatic Classification. Interspeech-2004/ICSLP-2004: International Conference Spoken Language Processing. Jeju Island, South Korea.Google Scholar
Holmes D. and Forsyth R. 1995. The Federalist Revisited: New Directions in Authorship Attribution. Literary and Linguistic Computing, pp. 111--127.Google Scholar
James C. E. 1980. Contrastive Analysis. New York: Longman.Google Scholar
Jusczyk P. W. 1997. The Discovery of Spoken Language. MIT Press.Google Scholar
Koppel M. and Schler J. 2003. Exploiting Stylistic Idiosyncrasies for Authorship Attribution. In Proceedings of IJCAI '03 Workshop on Computational Approaches to Style Analysis and Synthesis. Acapulco, Mexico.Google Scholar
Koppel M., Schler J. and Zigdon K. 2005(a). Determining an Author's Native Language by Mining a Text for Errors. Proceedings of KDD '05. Chicago IL. Google ScholarDigital Library
Koppel M., Schler J. and Zigdon K. 2005(b). Automatically Determining an Anonymous Author's Native Language. In Intelligence and Security Informatics (pp. 209--217). Berlin / Heidelberg: Springer. Google ScholarDigital Library
Odlin T. 1989. Language Transfer: Cross-Linguistic Influence in Language Learning. Cambridge University Press.Google ScholarCross Ref
Porter F. M. 1980. An Algorithm for Suffix Stripping. Program, 14(3):130--137.Google ScholarCross Ref
Saffran J. R. 2001. Words in a Sea of Sounds: The Output of Statistical Learning. Cognition, 81, 149--169.Google Scholar
Saffran J. R. 2002. Constraints on Statistical Language Learning. Journal of Memory and Language, 47, 172--196.Google ScholarCross Ref
Saffran J. R., Aslin R. N. and Newport E. N. 1996. Statistical Learning by 8-month Old Infants. Science, issue 5294, 1926--1928.Google Scholar
Salton G. and Buckley C. 1988. Term Weighing Approaches in Automatic Text Retrieval. Information Processing and Management, 24(5):513--523. Google ScholarDigital Library
Schölkopf B,. Smola A 2002. Learning with Kernels. MIT Press.Google Scholar
Stamatatos E,. Fakotakis N. and Kokkinakis G. 2004. Computer-Based Authorship Attribution Without Lexical Measures. Computers and the Humanities, 193--214.Google Scholar
Witten I. H. and Frank E. 2005. Data Mining: Practical Machine Learning Tools and Techniques. San Francisco: Morgan Kaufmann. Google ScholarDigital Library
Yang C. 2004. Universal Grammar, Statistics, or Both?. Trends in Cognitive Science 8(10):451--456, 2004.Google ScholarCross Ref

Using classifier features for studying the effect of native language on the choice of written second language words
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Layout-sensitive language extensibility with SugarHaskell
Haskell '12: Proceedings of the 2012 Haskell Symposium

Programmers need convenient syntax to write elegant and concise programs. Consequently, the Haskell standard provides syntactic sugar for some scenarios (e.g., do notation for monadic code), authors of Haskell compilers provide syntactic sugar for more ...
Read More
Creating and using domain-specific language features
GlobalDSL '13: Proceedings of the First Workshop on the Globalization of Domain Specific Languages

The value that domain-specific languages provide to their users is the domain-specific language features they contain. These features provide notations from the domain of interest, as well as domain-specific analysis and optimizations. But domain-...
Read More
Layout-sensitive language extensibility with SugarHaskell
Haskell '12

Programmers need convenient syntax to write elegant and concise programs. Consequently, the Haskell standard provides syntactic sugar for some scenarios (e.g., do notation for monadic code), authors of Haskell compilers provide syntactic sugar for more ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CACLA '07: Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition
June 2007
108 pages
Program Chairs:
Paula Buttery
University of Cambridge, UK
,
Aline Villavicencio
Federal University of Rio Grande do Sul, Brazil, University of Bath, UK
,
Anna Korhonen
University of Cambridge, UK
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 29 June 2007
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 461
  Total Downloads
- Downloads (Last 12 months)27
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Using classifier features for studying the effect of native language on the choice of written second language words

CACLA '07: Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition

ABSTRACT

References

Cited By

Recommendations

Layout-sensitive language extensibility with SugarHaskell

Creating and using domain-specific language features

Layout-sensitive language extensibility with SugarHaskell

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Using classifier features for studying the effect of native language on the choice of written second language words

CACLA '07: Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition

ABSTRACT

References

Cited By

Recommendations

Layout-sensitive language extensibility with SugarHaskell

Creating and using domain-specific language features

Layout-sensitive language extensibility with SugarHaskell

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media