ABSTRACT
In this paper, we define and study a new opinionated text data analysis problem called Latent Aspect Rating Analysis (LARA), which aims at analyzing opinions expressed about an entity in an online review at the level of topical aspects to discover each individual reviewer's latent opinion on each aspect as well as the relative emphasis on different aspects when forming the overall judgment of the entity. We propose a novel probabilistic rating regression model to solve this new text mining problem in a general way. Empirical experiments on a hotel review data set show that the proposed latent rating regression model can effectively solve the problem of LARA, and that the detailed analysis of opinions at the level of topical aspects enabled by the proposed model can support a wide range of application tasks, such as aspect opinion summarization, entity ranking based on aspect ratings, and analysis of reviewers rating behavior.
Supplemental Material
- Onix text retrieval toolkit stopword list. http://www.lextek.com/manuals/onix/stopwords1.html.Google Scholar
- D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
- C. Burges. A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 2(2):121--167, 1998. Google ScholarDigital Library
- C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/?cjlin/libsvm.Google Scholar
- H. Cui, V. Mittal, and M. Datar. Comparative experiments on sentiment classification for online product reviews. In Twenty-First National Conference on Artificial Intelligence, volume 21, page 1265, 2006. Google ScholarDigital Library
- K. Dave, S. Lawrence, and D. M. Pennock. Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In WWW '03, pages 519--528, 2003. Google ScholarDigital Library
- A. Devitt and K. Ahmad. Sentiment polarity identification in financial news: A cohesion-based approach. In Proceedings of ACL'07, pages 984--991, 2007.Google Scholar
- A. Esuli and F. Sebastiani. SentiWordNet: A publicly available lexical resource for opinion mining. In Proceedings of LREC, volume 6, 2006.Google Scholar
- A. Goldberg and X. Zhu. Seeing stars when there arena2rt many stars: Graph-based semi-supervised learning for sentiment categorization. In HLT-NAACL 2006 Workshop on Textgraphs: Graph-based Algorithms for Natural Language Processing, 2006. Google ScholarDigital Library
- M. Hu and B. Liu. Mining and summarizing customer reviews. In W. Kim, R. Kohavi, J. Gehrke, and W. DuMouchel, editors, KDD, pages 168--177. ACM, 2004. Google ScholarDigital Library
- K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In Proceedings of SIGIR'00, pages 41--48. ACM, 2000. Google ScholarDigital Library
- N. Jindal and B. Liu. Identifying comparative sentences in text documents. In Proceedings of SIGIR'06, pages 244--251, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- H. Kim and C. Zhai. Generating Comparative Summaries of Contradictory Opinions in Text. In Proceedings of CIKM'09, pages 385--394, 2009. Google ScholarDigital Library
- S. Kim and E. Hovy. Determining the sentiment of opinions. In Proceedings of COLING, volume 4, pages 1367--1373, 2004. Google ScholarDigital Library
- K. Lerman, S. Blair-Goldensohn, and R. T. McDonald. Sentiment summarization: Evaluating and learning user preferences. In EACL, pages 514--522, 2009. Google ScholarDigital Library
- B. Liu, M. Hu, and J. Cheng. Opinion observer: Analyzing and comparing opinions on the web. In WWW '05, pages 342--351, 2005. Google ScholarDigital Library
- Y. Lu, C. Zhai, and N. Sundaresan. Rated aspect summarization of short comments. In Proceedings of WWW'09, pages 131--140. ACM New York, NY, USA, 2009. Google ScholarDigital Library
- S. Morinaga, K. Yamanishi, K. Tateishi, and T. Fukushima. Mining product reputations on the web. In KDD '02, pages 341--349, 2002. Google ScholarDigital Library
- B. Pang and L. Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the ACL, pages 115--124, 2005. Google ScholarDigital Library
- B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques. In EMNLP 2002, pages 79--86, 2002. Google ScholarDigital Library
- A.-M. Popescu and O. Etzioni. Extracting product features and opinions from reviews. In Proceedings of HLT '05, pages 339--346, Morristown, NJ, USA, 2005. Association for Computational Linguistics. Google ScholarDigital Library
- M. Porter. An algorithm for suffix stripping. Program, 14(3):130 -- 137, 1980.Google ScholarCross Ref
- B. Snyder and R. Barzilay. Multiple aspect ranking using the good grief algorithm. In Proceedings of NAACL HLT, pages 300--307, 2007.Google Scholar
- I. Titov and R. McDonald. A joint model of text and aspect ratings for sentiment summarization. In ACL '08, pages 308--316.Google Scholar
- Y. Yang and J. O.Pedersen. A comparative study on feature selection in text categorization. In Proceedings of ICML'97, pages 412 -- 420, 1997. Google ScholarDigital Library
- L. Zhuang, F. Jing, and X. Zhu. Movie review mining and summarization. In Proceedings of CIKM 2006, page 50. ACM, 2006. Google ScholarDigital Library
Index Terms
- Latent aspect rating analysis on review text data: a rating regression approach
Recommendations
Aspect-Aware Latent Factor Model: Rating Prediction with Ratings and Reviews
WWW '18: Proceedings of the 2018 World Wide Web ConferenceAlthough latent factor models (e.g., matrix factorization) achieve good accuracy in rating prediction, they suffer from several problems including cold-start, non-transparency, and suboptimal recommendation for local users or items. In this paper, we ...
Aspect and sentiment unification model for online review analysis
WSDM '11: Proceedings of the fourth ACM international conference on Web search and data miningUser-generated reviews on the Web contain sentiments about detailed aspects of products and services. However, most of the reviews are plain text and thus require much effort to obtain information about relevant details. In this paper, we tackle the ...
Latent aspect rating analysis without aspect keyword supervision
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data miningMining detailed opinions buried in the vast amount of review text data is an important, yet quite challenging task with widespread applications in multiple domains. Latent Aspect Rating Analysis (LARA) refers to the task of inferring both opinion ...
Comments