skip to main content
10.1145/956750.956805acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Style mining of electronic messages for multiple authorship discrimination: first results

Published:24 August 2003Publication History

ABSTRACT

This paper considers the use of computational stylistics for performing authorship attribution of electronic messages, addressing categorization problems with as many as 20 different classes (authors). Effective stylistic characterization of text is potentially useful for a variety of tasks, as language style contains cues regarding the authorship, purpose, and mood of the text, all of which would be useful adjuncts to information retrieval or knowledge-management tasks. We focus here on the problem of determining the author of an anonymous message, based only on the message text. Several multiclass variants of the Winnow algorithm were applied to a vector representation of the message texts to learn models for discriminating different authors. We present results comparing the classification accuracy of the different approaches. The results show that stylistic models can be accurately learned to determine an author's identity.

References

  1. S. Argamon, M. Koppel, J. Fine, and A. R. Shimony. Gender, genre, and writing style in formal written texts. Text, 23(3), 2003.]]Google ScholarGoogle Scholar
  2. J. F. Burrows. Computers and the study of literature. In Computers and Written Texts, pages 167--204. Oxford: Blackwell, 1992.]]Google ScholarGoogle Scholar
  3. K. Crammer and Y. Singer. Ultraconservative online algorithms for multiclass problems. In Proc. COLT/EuroCOLT, pages 99--115, Amsterdam, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Cristianini and J. Shawe-Taylor. An Introduction To Support Vector Machines. Cambridge U. Press, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. I. Dagan, Y. Karov, and D. Roth. Mistake-driven learning in text categorization. In Proc. EMNLP-97, Providence, RI.]]Google ScholarGoogle Scholar
  6. O. de Vel. Mining e-mail authorship In KDD-2000 Workshop on Text Mining, Boston, MA, 2000.]]Google ScholarGoogle Scholar
  7. R. S. Forsyth and D. I. Holmes. Feature finding for text classification. Lit. and Ling. Comp., 11(4):163--174, 1996.]]Google ScholarGoogle ScholarCross RefCross Ref
  8. S. Har-Peled, D. Roth, and D. Zimak. Constraint classification for multiclass classification and ranking. In NIPS-15, 2002.]]Google ScholarGoogle Scholar
  9. D. I. Holmes. The evolution of stylometry in humanities scholarship. Lit. and Ling. Comp., 13(3):111--117, 1998.]]Google ScholarGoogle ScholarCross RefCross Ref
  10. J. Karlgren. Stylistic Experiments for Information Retrieval. PhD thesis, SICS, 2000.]]Google ScholarGoogle Scholar
  11. J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1--63, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Koppel, S. Argamon, and A. R. Shimoni. Automatically categorizing written texts by author gender. Lit. and Ling. Comp., 17(4), 2003.]]Google ScholarGoogle Scholar
  13. R. A. J. Matthews and T. V. N. Merriam. Neural computation in stylometry I: An application to the works of Shakespeare and Fletcher. Lit. and Ling. Comp., 8:103--209, 1993.]]Google ScholarGoogle Scholar
  14. A. McEnery and M. Oakes. Authorship studies/textual statistics, pages 234--248. Marcel Dekker, 2000.]]Google ScholarGoogle Scholar
  15. R. Mitton. Spelling checkers, spelling correctors and the misspellings of poor spellers. Information Processing and Management, 23(5):495--505, 1987.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. Mosteller and D. Wallace. Inference and Disputed Authorship: The Federalist. Addison-Wesley, Reading, Massachusetts, 1964.]]Google ScholarGoogle Scholar
  17. F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. Stamatatos, N. Fakotakis, and G. Kokkinakis. Automatic text categorisation in terms of genre and author. Comp. Ling., 26(4):471--495, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. F. Tweedie, S. Singh, and D. Holmes. Neural network applications in stylometry: The federalist papers. Computers and the Humanities, 30(1):1--10, 1996.]]Google ScholarGoogle ScholarCross RefCross Ref
  20. M. Wolters and M. Kirsten. Exploring the use of linguistic features in domain and genre classication. In Proc. EACL '99, pages 142--149, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. U. Yule. Statistical Study of Literary Vocabulary. Cambridge U. Press, 1944.]]Google ScholarGoogle Scholar

Index Terms

  1. Style mining of electronic messages for multiple authorship discrimination: first results

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
        August 2003
        736 pages
        ISBN:1581137370
        DOI:10.1145/956750

        Copyright © 2003 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 August 2003

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        KDD '03 Paper Acceptance Rate46of298submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader