skip to main content
10.3115/1118693.1118704dlproceedingsArticle/Chapter ViewAbstractPublication PagesemnlpConference Proceedingsconference-collections
Article
Free Access

Thumbs up?: sentiment classification using machine learning techniques

Published:06 July 2002Publication History

ABSTRACT

We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.

References

  1. Shlomo Argamon-Engelson, Moshe Koppel, and Galit Avneri. 1998. Style-based text categorization: What newspaper am I reading? In Proc. of the AAAI Workshop on Text Categorization, pages 1--4.Google ScholarGoogle Scholar
  2. Adam L. Berger, Stephen A. Della Pietra, and Vincent J. Della Pietra. 1996. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Douglas Biber. 1988. Variation across Speech and Writing. Cambridge University Press.Google ScholarGoogle Scholar
  4. Stanley Chen and Ronald Rosenfeld. 2000. A survey of smoothing techniques for ME models. IEEE Trans. Speech and Audio Processing, 8(1):37--50.Google ScholarGoogle ScholarCross RefCross Ref
  5. Sanjiv Das and Mike Chen. 2001. Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Proc. of the 8th Asia Pacific Finance Association Annual Conference (APFA 2001).Google ScholarGoogle Scholar
  6. Stephen Della Pietra, Vincent Della Pietra, and John Lafferty. 1997. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380--393. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Pedro Domingos and Michael J. Pazzani. 1997. On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29(2--3):103--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Aidan Finn, Nicholas Kushmerick, and Barry Smyth. 2002. Genre classification and domain transfer for information filtering. In Proc. of the European Colloquium on Information Retrieval Research, pages 353--362, Glasgow. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Vasileios Hatzivassiloglou and Kathleen McKeown. 1997. Predicting the semantic orientation of adjectives. In Proc. of the 35th ACL/8th EACL, pages 174--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Vasileios Hatzivassiloglou and Janyce Wiebe. 2000. Effects of adjective orientation and gradability on sentence subjectivity. In Proc. of COLING. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Marti Hearst. 1992. Direction-based text interpretation as an information access refinement. In Paul Jacobs, editor, Text-Based Intelligent Systems. Lawrence Erlbaum Associates. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Alison Huettner and Pero Subasic. 2000. Fuzzy typing for document management. In ACL 2000 Companion Volume: Tutorial Abstracts and Demonstration Notes, pages 26--27.Google ScholarGoogle Scholar
  13. Thorsten Joachims. 1998. Text categorization with support vector machines: Learning with many relevant features. In Proc. of the European Conference on Machine Learning (ECML), pages 137--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Thorsten Joachims. 1999. Making large-scale SVM learning practical. In Bernhard Schölkopf and Alexander Smola, editors, Advances in Kernel Methods - Support Vector Learning, pages 44--56. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jussi Karlgren and Douglass Cutting. 1994. Recognizing text genres with simple metrics using discriminant analysis. In Proc. of COLING. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Brett Kessler, Geoffrey Nunberg, and Hinrich Schütze. 1997. Automatic detection of text genre. In Proc. of the 35th ACL/8th EACL, pages 32--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. David D. Lewis. 1998. Naive (Bayes) at forty: The independence assumption in information retrieval. In Proc. of the European Conference on Machine Learning (ECML), pages 4--15. Invited talk. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Andrew McCallum and Kamal Nigam. 1998. A comparison of event models for Naive Bayes text classification. In Proc. of the AAAI-98 Workshop on Learning for Text Categorization, pages 41--48.Google ScholarGoogle Scholar
  19. Frederick Mosteller and David L. Wallace. 1984. Applied Bayesian and Classical Inference: The Case of the Federalist Papers. Springer-Verlag.Google ScholarGoogle Scholar
  20. Kamal Nigam, John Lafferty, and Andrew McCallum. 1999. Using maximum entropy for text classification. In Proc. of the IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61--67.Google ScholarGoogle Scholar
  21. Ted Pedersen. 2001. A decision tree of bigrams is an accurate predictor of word sense. In Proc. of the Second NAACL, pages 79--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Warren Sack. 1994. On the computation of point of view. In Proc. of the Twelfth AAAI, page 1488. Student abstract. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ellen Spertus. 1997. Smokey: Automatic recognition of hostile messages. In Proc. of Innovative Applications of Artificial Intelligence (IAAI), pages 1058--1065. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Junichi Tatemura. 2000. Virtual reviewers for collaborative exploration of movie reviews. In Proc. of the 5th International Conference on Intelligent User Interfaces, pages 272--275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Loren Terveen, Will Hill, Brian Amento, David McDonald, and Josh Creter. 1997. PHOAKS: A system for sharing recommendations. Communications of the ACM, 40(3):59--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Laura Mayfield Tomokiyo and Rosie Jones. 2001. You're not from round here, are you? Naive Bayes detection of non-native utterance text. In Proc. of the Second NAACL, pages 239--246.Google ScholarGoogle Scholar
  27. Richard M. Tong. 2001. An operational system for detecting and tracking opinions in on-line discussion. Workshop note, SIGIR 2001 Workshop on Operational Text Classification.Google ScholarGoogle Scholar
  28. Peter D. Turney and Michael L. Littman. 2002. Unsupervised learning of semantic orientation from a hundred-billion-word corpus. Technical Report EGB-1094, National Research Council Canada.Google ScholarGoogle Scholar
  29. Peter Turney. 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proc. of the ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Janyce M. Wiebe, Theresa Wilson, and Matthew Bell. 2001. Identifying collocations for recognizing opinions. In Proc. of the ACL/EACL Workshop on Collocation.Google ScholarGoogle Scholar
  31. Yorick Wilks and Mark Stevenson. 1998. The grammar of sense: Using part-of-speech tags as a first step in semantic disambiguation. Journal of Natural Language Engineering, 4(2):135--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Thumbs up?: sentiment classification using machine learning techniques

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image DL Hosted proceedings
          EMNLP '02: Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
          July 2002
          328 pages

          Publisher

          Association for Computational Linguistics

          United States

          Publication History

          • Published: 6 July 2002

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate73of234submissions,31%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader