skip to main content
10.5555/1858681.1858822dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
research-article
Free Access

A study of information retrieval weighting schemes for sentiment analysis

Published:11 July 2010Publication History

ABSTRACT

Most sentiment analysis approaches use as baseline a support vector machines (SVM) classifier with binary unigram weights. In this paper, we explore whether more sophisticated feature weighting schemes from Information Retrieval can enhance classification accuracy. We show that variants of the classic tf.idf scheme adapted to sentiment analysis provide significant increases in accuracy, especially when using a sublinear function for term frequency weights and document frequency smoothing. The techniques are tested on a wide selection of data sets and produce the best accuracy to our knowledge.

References

  1. }}Ahmed Abbasi, Hsinchun Chen, and Arab Salem. 2008. Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Trans. Inf. Syst., 26(3):1--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. }}Timothy G. Armstrong, Alistair Moffat, William Webber, and Justin Zobel. 2009. Improvements that don't add up: ad-hoc retrieval results since 1998. In David Wai Lok Cheung, Il Y. Song, Wesley W. Chu, Xiaohua Hu, Jimmy J. Lin, David Wai Lok Cheung, Il Y. Song, Wesley W. Chu, Xiaohua Hu, and Jimmy J. Lin, editors, CIKM, pages 601--610, New York, NY, USA. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. }}Anthony Aue and Michael Gamon. 2005. Customizing sentiment classifiers to new domains: A case study. In Proceedings of Recent Advances in Natural Language Processing (RANLP).Google ScholarGoogle Scholar
  4. }}John Blitzer, Mark Dredze, and Fernando Pereira. 2007. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 440--447, Prague, Czech Republic, June. Association for Computational Linguistics.Google ScholarGoogle Scholar
  5. }}Ann Devitt and Khurshid Ahmad. 2007. Sentiment polarity identification in financial news: A cohesion-based approach. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 984--991, Prague, Czech Republic, June. Association for Computational Linguistics.Google ScholarGoogle Scholar
  6. }}Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871--1874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}Stephan Greene and Philip Resnik. 2009. More than words: Syntactic packaging and implicit sentiment. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 503--511, Boulder, Colorado, June. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. }}K. Sparck Jones, S. Walker, and S. E. Robertson. 2000. A probabilistic model of information retrieval: development and comparative experiments. Inf. Process. Manage., 36(6):779--808. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. }}Chenghua Lin and Yulan He. 2009. Joint sentiment/topic model for sentiment analysis. In CIKM '09: Proceeding of the 18th ACM conference on Information and knowledge management, pages 375--384, New York, NY, USA. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. }}Wei-Hao Lin, Theresa Wilson, Janyce Wiebe, and Alexander Hauptmann. 2006. Which side are you on? identifying perspectives at the document and sentence levels. In Proceedings of the Conference on Natural Language Learning (CoNLL). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. }}Hugo Liu. 2004. MontyLingua: An end-to-end natural language processor with common sense. Technical report, MIT.Google ScholarGoogle Scholar
  12. }}C. Macdonald and I. Ounis. 2006. The trec blogs06 collection: Creating and analysing a blog test collection. DCS Technical Report Series.Google ScholarGoogle Scholar
  13. }}Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, 1 edition, July. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. }}J. R. Martin and P. R. R. White. 2005. The language of evaluation: appraisal in English / J. R. Martin and P. R. R. White. Palgrave Macmillan, Basingstoke:.Google ScholarGoogle Scholar
  15. }}Justin Martineau and Tim Finin. 2009. Delta TFIDF: An Improved Feature Space for Sentiment Analysis. In Proceedings of the Third AAAI Internatonal Conference on Weblogs and Social Media, San Jose, CA, May. AAAI Press. (poster paper).Google ScholarGoogle Scholar
  16. }}A. Mccallum and K. Nigam. 1998. A comparison of event models for naive bayes text classification.Google ScholarGoogle Scholar
  17. }}G. Mishne. 2005. Experiments with mood classification in blog posts. In 1st Workshop on Stylistic Analysis Of Text For Information Access.Google ScholarGoogle Scholar
  18. }}Tony Mullen and Nigel Collier. 2004. Sentiment analysis using support vector machines with diverse information sources. In Dekang Lin and Dekai Wu, editors, Proceedings of EMNLP 2004, pages 412--418, Barcelona, Spain, July. Association for Computational Linguistics.Google ScholarGoogle Scholar
  19. }}Charles E. Osgood. 1967. The measurement of meaning / {by} {Charles E. Osgood, George J. Suci {and} Percy H. Tannenbaum}. University of Illinois Press, Urbana:, 2nd ed. edition.Google ScholarGoogle Scholar
  20. }}Iadh Ounis, Craig Macdonald, and Ian Soboroff. 2008. Overview of the trec-2008 blog trac. In The Seventeenth Text REtrieval Conference (TREC 2008) Proceedings. NIST.Google ScholarGoogle Scholar
  21. }}Bo Pang and Lillian Lee. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In In Proceedings of the ACL, pages 271--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. }}B. Pang and L. Lee. 2008. Opinion Mining and Sentiment Analysis. Now Publishers Inc.Google ScholarGoogle Scholar
  23. }}Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? sentiment classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. }}Rudy Prabowo and Mike Thelwall. 2009. Sentiment analysis: A combined approach. Journal of Informetrics, 3(2):143--157, April.Google ScholarGoogle ScholarCross RefCross Ref
  25. }}Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. 1994. Okapi at trec-3. In TREC, pages 0-.Google ScholarGoogle Scholar
  26. }}S E Robertson, S Walker, S Jones, M M Hancock-Beaulieu, and M Gatford. 1996. Okapi at trec-2. In In The Second Text REtrieval Conference (TREC-2), NIST Special Special Publication 500--215, pages 21--34.Google ScholarGoogle Scholar
  27. }}Stephen Robertson, Hugo Zaragoza, and Michael Taylor. 2004. Simple bm25 extension to multiple weighted fields. In CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management, pages 42--49, New York, NY, USA. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. }}Gerard Salton and Chris Buckley. 1987. Term weighting approaches in automatic text retrieval. Technical report, Ithaca, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. }}Gerard Salton and Michael J. McGill. 1986. Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. }}G. Salton. 1971. The SMART Retrieval System---Experiments in Automatic Document Processing. Prentice-Hall, Inc., Upper Saddle River, NJ, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. }}Fabrizio Sebastiani. 2002. Machine learning in automated text categorization. ACM Computing Surveys, 34(1): 1ñ47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. }}Amit Singhal, Gerard Salton, and Chris Buckley. 1995. Length normalization in degraded text collections. Technical report, Ithaca, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. }}Matt Thomas, Bo Pang, and Lillian Lee. 2006. Get out the vote: Determining support or opposition from congressional floor-debate transcripts. CoRR, abs/cs/0607062.Google ScholarGoogle Scholar
  34. }}Peter D. Turney. 2002. Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In ACL, pages 417--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. }}Casey Whitelaw, Navendu Garg, and Shlomo Argamon. 2005. Using appraisal groups for sentiment analysis. In CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management, pages 625--631, New York, NY, USA. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. }}Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of Human Language Technologies Conference/Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005), Vancouver, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. }}Ian H. Witten and Eibe Frank. 1999. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations (The Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann, 1st edition, October. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. }}Alex Wright. 2009. Mining the web for feelings, not facts. August 23, NY Times, last accessed October 2,2009, http://http://www.nytimes.com/2009/08/24/technology/internet/24emotion.html?_r=1.Google ScholarGoogle Scholar
  39. }}O. F. Zaidan, J. Eisner, and C. D. Piatko. 2007. Using Annotator Rationales to Improve Machine Learning for Text Categorization. Proceedings of NAACL HLT, pages 260--267.Google ScholarGoogle Scholar
  40. }}Justin Zobel and Alistair Moffat. 1998. Exploring the similarity space. SIGIR Forum, 32(1): 18--34. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A study of information retrieval weighting schemes for sentiment analysis

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image DL Hosted proceedings
          ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
          July 2010
          1618 pages
          • Program Chair:
          • Jan Hajič

          Publisher

          Association for Computational Linguistics

          United States

          Publication History

          • Published: 11 July 2010

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate85of443submissions,19%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader