research-article

Modeling online reviews with multi-grain topic models

Authors:
Ivan Titov

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Ryan McDonald

Google Inc., New York, NY, USA

Google Inc., New York, NY, USA
View Profile

WWW '08: Proceedings of the 17th international conference on World Wide WebApril 2008Pages 111–120https://doi.org/10.1145/1367497.1367513

Published:21 April 2008Publication History

WWW '08: Proceedings of the 17th international conference on World Wide Web

Pages 111–120

ABSTRACT

In this paper we present a novel framework for extracting the ratable aspects of objects from online user reviews. Extracting such aspects is an important challenge in automatically mining product opinions from the web and in generating opinion-based summaries of user reviews [18, 19, 7, 12, 27, 36, 21]. Our models are based on extensions to standard topic modeling methods such as LDA and PLSA to induce multi-grain topics. We argue that multi-grain models are more appropriate for our task since standard models tend to produce topics that correspond to global properties of objects (e.g., the brand of a product type) rather than the aspects of an object that tend to be rated by a user. The models we present not only extract ratable aspects, but also cluster them into coherent topics, e.g., 'waitress' and 'bartender' are part of the same topic 'staff' for restaurants. This differentiates it from much of the previous work which extracts aspects through term frequency analysis with minimal clustering. We evaluate the multi-grain models both qualitatively and quantitatively to show that they improve significantly upon standard topic models.

References

P. Beineke, T. Hastie, C. Manning, and S. Vaithyanathan. An exploration of sentiment summarization. In Proc. of AAAI, 2003.Google Scholar
D. Blei, T. Griffiths, M. Jordan, and J. Tenenbaum. Hierarchical topic models and the nested Chinese restaurant process. In Advances in Neural Information Processing Systems 16, 2004.Google Scholar
D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3(5):993--1022, 2003. Google ScholarDigital Library
D. M. Blei and J. D. McAuliffe. Supervised topic models. In Advances in Neural Information Processing Systems (NIPS), 2008.Google Scholar
D. M. Blei and P. J. Moreno. Topic segmentation with an aspect hidden Markov model. In Proc. of the Conference on Research & Development on Information Retrieval (SIGIR), pages 343--348, 2001. Google ScholarDigital Library
C. Carenini, R. Ng, and A. Pauls. Multi-Document Summarization of Evaluative Text. In Proc. of the Conf. of the European Chapter of the Association for Computational Linguistics, 2006.Google Scholar
C. Carenini, R. Ng, and E. Zwart. Extracting knowledge from evaluative text. In Proc. of the 3rd Int. Conf. on Knowledge Capture, pages 11--18, 2005. Google ScholarDigital Library
K. Crammer and Y. Singer. Pranking with ranking. In Advances in Neural Information Processing Systems (NIPS), pages 641--647, 2002.Google Scholar
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.Google ScholarCross Ref
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithms. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1--38, 1977.Google ScholarCross Ref
K. Fujimura, T. Inoue, and M. Sugisaki. The EigenRumor Algorithm for Ranking Blogs. In WWW Workshop on the Weblogging Ecosystem, 2005.Google Scholar
M. Gamon, A. Aue, S. Corston-Oliver, and E. Ringger. Pulse: Mining customer opinions from free text. In Proc. of the 6th International Symposium on Intelligent Data Analysis, pages 121--132, 2005. Google ScholarDigital Library
S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721--741, 1984.Google ScholarDigital Library
T. L. Griffiths and M. Steyvers. Finding scientific topics. Proc. of the Natural Academy of Sciences, 101 Suppl 1:5228--5235, 2004.Google ScholarCross Ref
T. L. Griffiths, M. Steyvers, D. M. Blei, and J. B. Tenenbaum. Integrating topics and syntax. In Advances in Neural Information Processing Systems, 2004.Google Scholar
A. Gruber, Y. Weiss, and M. Rosen-Zvi. Hidden Topic Markov Models. In Proc. of the Conference on Artificial Intelligence and Statistics, 2007.Google Scholar
T. Hofmann. Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning, 42(1):177--196, 2001. Google ScholarDigital Library
M. Hu and B. Liu. Mining and summarizing customer reviews. In Proc. of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pages 168--177, 2004. Google ScholarDigital Library
M. Hu and B. Liu. Mining Opinion Features in Customer Reviews. In Proc. of Nineteenth National Conference on Artificial Intellgience, 2004. Google ScholarDigital Library
W. Li and A. McCallum. Pachinko Allocation: DAG-structured Mixture Models of Topic Correlations. In Proc. Int. Conference on Machine Learning, 2006. Google ScholarDigital Library
Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai. Topic sentiment mixture: modeling facets and opinions in weblogs. In Proc. of the 16th Int. Conference on World Wide Web, pages 171--180, 2007. Google ScholarDigital Library
D. Mimno, W. Li, and A. McCallum. Mixtures of hierarchical topics with Pachinko allocation. In Proc. 24th Int. Conf. on Machine Learning (ICML), 2007. Google ScholarDigital Library
T. Minka and J. La. Expectation-propagation for the generative aspect model. In Proc. of the 18th Conf. on Uncertainty in Artificial Intelligence, 2002. Google ScholarDigital Library
I. Ounis, M. de Rijke, C. Macdonald, G. Mishne, and I. Soboroff. Overview of the TREC-2006 Blog Track. In Text REtrieval Conference (TREC), 2006.Google Scholar
B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques. In Proc. of the Conf. on Empirical Methods in Natural Language Processing, 2002. Google ScholarDigital Library
F. Pereira, N. Tishby, and L. Lee. Distributional clustering of english words. In Proc. 31st Meeting of Association for Computational Linguistics, 1993. Google ScholarDigital Library
A. Popescu and O. Etzioni. Extracting product features and opinions from reviews. In Proc. of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2005. Google ScholarDigital Library
M. Purver, K. Kording, T. Griffiths, and J. Tenenbaum. Unsupervised topic modelling for multi-party spoken discourse. In Proc. of the Annual Meeting of the ACL and the International Conference on Computational Linguistics, pages 17--24, 2006. Google ScholarDigital Library
L. Saul and F. Pereira. Aggregate and mixed-order Markov models for statistical language processing. In Proc. of the 2nd Int. Conf. on Empirical Methods in Natural Language Processing, 1997.Google Scholar
B. Snyder and R. Barzilay. Multiple Aspect Ranking using the Good Grief Algorithm. In Proc. of the Joint Conference of the North American Chapter of the Association for Computational Linguistics and Human Language Technologies, pages 300--307, 2007.Google Scholar
P. Turney. Thumbs up or thumbs down? Sentiment orientation applied to unsupervised classification of reviews. In Proc. of the Annual Meeting of the ACL, 2002. Google ScholarDigital Library
H. M. Wallach. Topic modeling; beyond bag of words. In Int. Conference on Machine Learning, 2006. Google ScholarDigital Library
X. Wang and A. McCallum. A note on topical n-grams. Technical Report UM-CS-2005-071, University of Massachusetts, 2005.Google Scholar
J. Wiebe. Learning subjective adjectives from corpora. In Proc. of the National Conference on Artificial Intelligence, 2000. Google ScholarDigital Library
C. Zhai, A. Velivelli, and B. Yu. A Cross-Collection Mixture Model for Comparative Text Mining. In Proc. of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pages 743--748, 2004. Google ScholarDigital Library
L. Zhuang, F. Jing, and X. Zhu. Movie review mining and summarization. In Proc. of the 15th ACM international conference on Information and knowledge management (CIKM), pages 43--50, 2006. Google ScholarDigital Library

Index Terms

Modeling online reviews with multi-grain topic models
1. Information systems
  1. Information retrieval
    1. Document representation
  2. Information systems applications
    1. Data mining

Recommendations

Joint sentiment/topic model for sentiment analysis
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet ...
Read More
A joint model for topic-sentiment modeling from text
SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied Computing

Traditional topic models, like LDA and PLSA, have been efficiently extended to capture further aspects of text in addition to the latent topics (e.g., time evolution, sentiment etc.). In this paper, we discuss the issue of joint topic-sentiment ...
Read More
Parametric and Non-parametric User-aware Sentiment Topic Models
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

The popularity of Web 2.0 has resulted in a large number of publicly available online consumer reviews created by a demographically diverse user base. Information about the authors of these reviews, such as age, gender and location, provided by many on-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '08: Proceedings of the 17th international conference on World Wide Web
April 2008
1326 pages
ISBN:9781605580852
DOI:10.1145/1367497
General Chairs:
Jinpeng Huai
Beihang University, China
,
Robin Chen
AT&T Labs, USA
,
Hsiao-Wuen Hon
Microsoft Research Asia, China
,
Yunhao Liu
HK University of Science and Technology, Hong Kong
,
Program Chairs:
Wei-Ying Ma
Microsoft Research Asia, China
,
Andrew Tomkins
Yahoo! Research, USA
,
Xiaodong Zhang
The Ohio State University, USA
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 April 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering
opinion mining
topic models
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 425
  Total Citations
  View Citations
- 4,077
  Total Downloads
- Downloads (Last 12 months)139
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Modeling online reviews with multi-grain topic models

WWW '08: Proceedings of the 17th international conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Joint sentiment/topic model for sentiment analysis

A joint model for topic-sentiment modeling from text

Parametric and Non-parametric User-aware Sentiment Topic Models