skip to main content
10.3115/1220355.1220476dlproceedingsArticle/Chapter ViewAbstractPublication PagescolingConference Proceedingsconference-collections
Article
Free Access

Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis

Published:23 August 2004Publication History

ABSTRACT

We demonstrate that it is possible to perform automatic sentiment classification in the very noisy domain of customer feedback data. We show that by using large feature vectors in combination with feature reduction, we can train linear support vector machines that achieve high classification accuracy on data that present classification challenges even for a human annotator. We also show that, surprisingly, the addition of deep linguistic analysis features to a set of surface level word n-gram features contributes consistently to classification accuracy in this domain.

References

  1. Harald Baayen, Hans van Halteren, and Fiona Tweedie. 1996. Outside the Cave of Shadows: Using Syntactic Annotation to Enhance Authorship Attribution. Literary and Linguistic Computing 11 (3): 121--131.Google ScholarGoogle ScholarCross RefCross Ref
  2. Thomas G. Dietterich (1997): "Machine-learning research: Four current directions". In: AI Magazine, 18 (4), pp. 97--136.Google ScholarGoogle Scholar
  3. Susan Dumais, John Platt, David Heckerman, Mehran Sahami (1998): "Inductive Learning Algorithms and Representations for Text Categorization". Proceedings of CIKM-98, pp. 148--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ted Dunning. 1993. Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19: 61--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Aidan Finn and Nicholas Kushmerick (2003): "Learning to classify documents according to genre". IJCAI-03 Workshop on Computational Approaches to Text Style and Synthesis.Google ScholarGoogle Scholar
  6. Michael Gamon (2004): "Linguistic correlates of style: authorship classification with deep linguistic analysis features". Paper to be presented at COLING 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. George Heidorn. (2000): "Intelligent Writing Assistance." In R. Dale, H. Moisl and H. Somers, eds., Handbook of Natural Language Processing. Marcel Dekker.Google ScholarGoogle Scholar
  8. Thorsten Joachims (1998): "Text Categorization with Support Vector Machines: Learning with Many Relevant Features". Proceedings of ECML 1998, pp. 137--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kushal Dave, Steve Lawrence and David M. Pennock (2003): "Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews". In: Proceedings of the Twelfth International World Wide Web Conference, pp. 519--528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hugo Liu, Henry Lieberman and Ted Selker (2003): "A Model of Textual Affect Sensing using Real-World Knowledge". In: Proceedings of the Seventh Conference on Intelligent User Interfaces, pp. 125--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Tetsuya Nasukawa and Jeonghee Yi (2003): "Sentiment Analysis: Capturing Favorability Using Natural Language Processing". In: proceedings of the International Conference on Knowledge Capture, pp. 70--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Bo Pang, Lillian Lee and Shivakumar Vaithyanathan (2002): "Thumbs up? Sentiment Classification using Machine Learning Techniques". Proceedings of EMNLP 2002, pp. 79--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. John Platt (1999): "Fast training of SVMs using sequential minimal optimization". In: B. Schoelkopf, C. Burges and A. Smola (eds.) "Advances in Kernel Methods: Support Vector Learning", MIT Press, Cambridge, MA, pp. 185--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Pero Subasic and Alison Huettner (2001): "Affect Analysis of Text Using Fuzzy Semantic Typing". In: Proceedings of the Tenth IEEE International Conference on Fuzzy Systems, pp. 483--496.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ljupčo Todorovski and Sašo Džeroski (2003): "Combining Classifiers with Meta Decision Trees". In: Machine Learning, 50, pp.223--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Peter D. Turney (2002): "Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews". In: Proceedings of ACL 2002, pp. 417--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Peter D. Turney and M. L. Littman (2002): "Unsupervised ILearning of Semantic Orientation from a Hundred-Billion-Word Corpus." Technical report ERC-1094 (NRC 44929), National research Council of Canada.Google ScholarGoogle Scholar
  18. Janyce Wiebe, Theresa Wilson and Matthew Bell (2001): "Identifying Collocations for Recognizing Opinions". In: Proceedings of the ACL/EACL Workshop on Collocation.Google ScholarGoogle Scholar
  19. Hong Yu and Vasileios Hatzivassiloglou (2003): "Towards Answering pinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences". In: Proceedings of EMNLP 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image DL Hosted proceedings
          COLING '04: Proceedings of the 20th international conference on Computational Linguistics
          August 2004
          1411 pages

          Publisher

          Association for Computational Linguistics

          United States

          Publication History

          • Published: 23 August 2004

          Qualifiers

          • Article

          Acceptance Rates

          COLING '04 Paper Acceptance Rate1,411of1,411submissions,100%Overall Acceptance Rate1,537of1,537submissions,100%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader