skip to main content
10.1145/1367497.1367513acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Modeling online reviews with multi-grain topic models

Published:21 April 2008Publication History

ABSTRACT

In this paper we present a novel framework for extracting the ratable aspects of objects from online user reviews. Extracting such aspects is an important challenge in automatically mining product opinions from the web and in generating opinion-based summaries of user reviews [18, 19, 7, 12, 27, 36, 21]. Our models are based on extensions to standard topic modeling methods such as LDA and PLSA to induce multi-grain topics. We argue that multi-grain models are more appropriate for our task since standard models tend to produce topics that correspond to global properties of objects (e.g., the brand of a product type) rather than the aspects of an object that tend to be rated by a user. The models we present not only extract ratable aspects, but also cluster them into coherent topics, e.g., 'waitress' and 'bartender' are part of the same topic 'staff' for restaurants. This differentiates it from much of the previous work which extracts aspects through term frequency analysis with minimal clustering. We evaluate the multi-grain models both qualitatively and quantitatively to show that they improve significantly upon standard topic models.

References

  1. P. Beineke, T. Hastie, C. Manning, and S. Vaithyanathan. An exploration of sentiment summarization. In Proc. of AAAI, 2003.Google ScholarGoogle Scholar
  2. D. Blei, T. Griffiths, M. Jordan, and J. Tenenbaum. Hierarchical topic models and the nested Chinese restaurant process. In Advances in Neural Information Processing Systems 16, 2004.Google ScholarGoogle Scholar
  3. D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3(5):993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. M. Blei and J. D. McAuliffe. Supervised topic models. In Advances in Neural Information Processing Systems (NIPS), 2008.Google ScholarGoogle Scholar
  5. D. M. Blei and P. J. Moreno. Topic segmentation with an aspect hidden Markov model. In Proc. of the Conference on Research & Development on Information Retrieval (SIGIR), pages 343--348, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Carenini, R. Ng, and A. Pauls. Multi-Document Summarization of Evaluative Text. In Proc. of the Conf. of the European Chapter of the Association for Computational Linguistics, 2006.Google ScholarGoogle Scholar
  7. C. Carenini, R. Ng, and E. Zwart. Extracting knowledge from evaluative text. In Proc. of the 3rd Int. Conf. on Knowledge Capture, pages 11--18, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Crammer and Y. Singer. Pranking with ranking. In Advances in Neural Information Processing Systems (NIPS), pages 641--647, 2002.Google ScholarGoogle Scholar
  9. S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  10. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithms. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1--38, 1977.Google ScholarGoogle ScholarCross RefCross Ref
  11. K. Fujimura, T. Inoue, and M. Sugisaki. The EigenRumor Algorithm for Ranking Blogs. In WWW Workshop on the Weblogging Ecosystem, 2005.Google ScholarGoogle Scholar
  12. M. Gamon, A. Aue, S. Corston-Oliver, and E. Ringger. Pulse: Mining customer opinions from free text. In Proc. of the 6th International Symposium on Intelligent Data Analysis, pages 121--132, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721--741, 1984.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. L. Griffiths and M. Steyvers. Finding scientific topics. Proc. of the Natural Academy of Sciences, 101 Suppl 1:5228--5235, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  15. T. L. Griffiths, M. Steyvers, D. M. Blei, and J. B. Tenenbaum. Integrating topics and syntax. In Advances in Neural Information Processing Systems, 2004.Google ScholarGoogle Scholar
  16. A. Gruber, Y. Weiss, and M. Rosen-Zvi. Hidden Topic Markov Models. In Proc. of the Conference on Artificial Intelligence and Statistics, 2007.Google ScholarGoogle Scholar
  17. T. Hofmann. Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning, 42(1):177--196, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Hu and B. Liu. Mining and summarizing customer reviews. In Proc. of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pages 168--177, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Hu and B. Liu. Mining Opinion Features in Customer Reviews. In Proc. of Nineteenth National Conference on Artificial Intellgience, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. W. Li and A. McCallum. Pachinko Allocation: DAG-structured Mixture Models of Topic Correlations. In Proc. Int. Conference on Machine Learning, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai. Topic sentiment mixture: modeling facets and opinions in weblogs. In Proc. of the 16th Int. Conference on World Wide Web, pages 171--180, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Mimno, W. Li, and A. McCallum. Mixtures of hierarchical topics with Pachinko allocation. In Proc. 24th Int. Conf. on Machine Learning (ICML), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Minka and J. La. Expectation-propagation for the generative aspect model. In Proc. of the 18th Conf. on Uncertainty in Artificial Intelligence, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. I. Ounis, M. de Rijke, C. Macdonald, G. Mishne, and I. Soboroff. Overview of the TREC-2006 Blog Track. In Text REtrieval Conference (TREC), 2006.Google ScholarGoogle Scholar
  25. B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques. In Proc. of the Conf. on Empirical Methods in Natural Language Processing, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. F. Pereira, N. Tishby, and L. Lee. Distributional clustering of english words. In Proc. 31st Meeting of Association for Computational Linguistics, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Popescu and O. Etzioni. Extracting product features and opinions from reviews. In Proc. of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Purver, K. Kording, T. Griffiths, and J. Tenenbaum. Unsupervised topic modelling for multi-party spoken discourse. In Proc. of the Annual Meeting of the ACL and the International Conference on Computational Linguistics, pages 17--24, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Saul and F. Pereira. Aggregate and mixed-order Markov models for statistical language processing. In Proc. of the 2nd Int. Conf. on Empirical Methods in Natural Language Processing, 1997.Google ScholarGoogle Scholar
  30. B. Snyder and R. Barzilay. Multiple Aspect Ranking using the Good Grief Algorithm. In Proc. of the Joint Conference of the North American Chapter of the Association for Computational Linguistics and Human Language Technologies, pages 300--307, 2007.Google ScholarGoogle Scholar
  31. P. Turney. Thumbs up or thumbs down? Sentiment orientation applied to unsupervised classification of reviews. In Proc. of the Annual Meeting of the ACL, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. M. Wallach. Topic modeling; beyond bag of words. In Int. Conference on Machine Learning, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. X. Wang and A. McCallum. A note on topical n-grams. Technical Report UM-CS-2005-071, University of Massachusetts, 2005.Google ScholarGoogle Scholar
  34. J. Wiebe. Learning subjective adjectives from corpora. In Proc. of the National Conference on Artificial Intelligence, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. Zhai, A. Velivelli, and B. Yu. A Cross-Collection Mixture Model for Comparative Text Mining. In Proc. of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pages 743--748, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. L. Zhuang, F. Jing, and X. Zhu. Movie review mining and summarization. In Proc. of the 15th ACM international conference on Information and knowledge management (CIKM), pages 43--50, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Modeling online reviews with multi-grain topic models

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        WWW '08: Proceedings of the 17th international conference on World Wide Web
        April 2008
        1326 pages
        ISBN:9781605580852
        DOI:10.1145/1367497

        Copyright © 2008 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 April 2008

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,899of8,196submissions,23%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader