skip to main content
10.3115/1067807.1067843dlproceedingsArticle/Chapter ViewAbstractPublication PageseaclConference Proceedingsconference-collections
Article
Free Access

Language independent authorship attribution using character level language models

Published:12 April 2003Publication History

ABSTRACT

We present a method for computer-assisted authorship attribution based on character-level n-gram language models. Our approach is based on simple information theoretic principles, and achieves improved performance across a variety of languages without requiring extensive pre-processing or feature selection. To demonstrate the effectiveness and language independence of our approach, we present experimental results on Greek, English, and Chinese data. We show that our approach achieves state of the art performance in each of these cases. In particular, we obtain a 18% accuracy improvement over the best published results for a Greek data set, while using a far simpler technique than previous investigations.

References

  1. A. Aizawa. 2001. Linguistic Techniques to Improve the Performance of Automatic Text Categorization. In Proceedings 6th NLP Pac. Rim Symp. NLPRS-01.Google ScholarGoogle Scholar
  2. C. Apté, F. Damerau and S. Weiss. 1994. Toward Language Independent Automated Learning of Text Categorization Models. In Proceedings SIGIR-94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Bell, J. Cleary and I. Witten. 1990. Text Compression. Prentice Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. W. Cavnar and J. Trenkle. 1994. N-Gram-Based Text Categorization. In Proceedings SDAIR-94.Google ScholarGoogle Scholar
  5. S. Chen and J. Goodman. 1998. An Empirical Study of Smoothing Techniques for Language Modeling. TR- 10-98, Harvard.Google ScholarGoogle Scholar
  6. M. Ephratt. 1997. Authorship Attribution - the Case of Lexical Innovations. In Proc. ACH-ALLC-97.Google ScholarGoogle Scholar
  7. D. Holmes and R. Forsyth. 1995. The Federalist Revisited: New Directions in Authorship Attribution. In Literary and Linguistic Computing, 10, 111--127.Google ScholarGoogle ScholarCross RefCross Ref
  8. H. Love, (2002). Attributing Authorship: An Introduction. Cambridge University Press.Google ScholarGoogle Scholar
  9. S. Scott and S. Matwin. 1999. Feature Engineering for Text Classification. In Proceedings ICML-99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. Stamatatos, N. Fakotakis and G. Kokkinakis. 1999. Automatic Authorship Attribution. In EACL-99 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. Stamatatos, N. Fakotakis and G. Kokkinakis. 2000. Automatic Text Categorization in Terms of Genre and Author. Comput. Ling., 26(4), pp. 471--495. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Stamatatos, N. Fakotakis and G. Kokkinakis. 2001. Computer-based Authorship Attribution without Lexical Measures Computers and the Humanities, 35, pp. 193--214.Google ScholarGoogle Scholar
  13. I. Witten, Z. Bray, M. Mahoui and W. Teahan. 1999. Text mining: A New Frontier for Lossless Compression. Proceedings IEEE Data Compression 97 Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image DL Hosted proceedings
    EACL '03: Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
    April 2003
    394 pages
    ISBN:1333567890

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    • Published: 12 April 2003

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate100of360submissions,28%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader