skip to main content
10.1145/1081870.1081919acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Deriving marketing intelligence from online discussion

Authors Info & Claims
Published:21 August 2005Publication History

ABSTRACT

Weblogs and message boards provide online forums for discussion that record the voice of the public. Woven into this mass of discussion is a wide range of opinion and commentary about consumer products. This presents an opportunity for companies to understand and respond to the consumer by analyzing this unsolicited feedback. Given the volume, format and content of the data, the appropriate approach to understand this data is to use large-scale web and text data mining technologies.This paper argues that applications for mining large volumes of textual data for marketing intelligence should provide two key elements: a suite of powerful mining and visualization technologies and an interactive analysis environment which allows for rapid generation and testing of hypotheses. This paper presents such a system that gathers and annotates online discussion relating to consumer products using a wide variety of state-of-the-art techniques, including crawling, wrapping, search, text classification and computational linguistics. Marketing intelligence is derived through an interactive analysis framework uniquely configured to leverage the connectivity and content of annotated online discussion.

References

  1. S. Abney. Partial parsing via finite-state cascades. In Workshop on Robust Parsing, 8th European Summer School in Logic, Language and Information, 1996.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Agrawal, S. Rajagopalan, R. Srikant, and Y. Xu. Mining newsgroups using networks arising from social behavior. In Proceedings of the Twelfth International World Wide Web Conference (WWW2003), 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In J. B. Bocca, M. Jarke, and C. Zaniolo, editors, Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487--499. Morgan Kaufmann, 12--15 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Baumgartner, S. Flesca, and G. Gottlob. Declarative information extraction, Web crawling, and recursive wrapping with Lixto. Lecture Notes in Computer Science, 2173, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. D. Bollacker, S. Lawrence, and C. L. Giles. CiteSeer: An autonomous web agent for automatic retrieval and identification of interesting publications. In Agents '98, pages 116--123, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Chen, J. Hu, and R. W. Sproat. Integrating geometric and linguistic analysis for e-mail signature block parsing. ACM Transactions on Information Systems, 17(4):343--366, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. W. Cohen. Data integration using similarity joins and a word-based information representation language. ACM Transactions on Information Systems, 18(3):288---321, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. W. W. Cohen, L. S. Jensen, and M. Hurst. A flexible learning system for wrapping tables and lists in HTML documents. In Proceedings of The Eleventh International World Wide Web Conference (WWW-2002), Honolulu, Hawaii, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery. Learning to construct knowledge bases from the World Wide Web. Artificial Intelligence, 118(1--2):69--113, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Glance and W. Cohen. BoardViewer: Meta-search and community mapping over message boards. Intelliseek Technical Report, 2003.]]Google ScholarGoogle Scholar
  11. N. Glance, M. Hurst, and T. Tomokiyo. BlogPulse: Automated trend discovery for weblogs. In WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2004.]]Google ScholarGoogle Scholar
  12. M. Hurst and K. Nigam. Retrieving topical sentiments from online document collections. In Document Recognition and Retrieval XI, pages 27--34, 2004.]]Google ScholarGoogle Scholar
  13. L. S. Jensen and W. Cohen. Grouping extracted fields. In Proceedings of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, 2001.]]Google ScholarGoogle Scholar
  14. T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In Machine Learning: ECML-98, Tenth European Conference on Machine Learning, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. D. Lewis and J. Catlett. Heterogeneous uncertainty sampling for supervised learning. In Machine Learning: Proceedings of the Eleventh International Conference, 1994.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. D. Lewis and W. A. Gale. A sequential algorithm for training text classifiers. In SIGIR '94, pages 3--12, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285--318, 1988.]] Google ScholarGoogle ScholarCross RefCross Ref
  18. A. McCallum and K. Nigam. Employing EM in pool-based active learning for text classification. In Machine Learning: Proceedings of the Fifteenth International Conference, pages 350--358, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Myllymaki. Effective web data extraction with standard XML technologies. In Proc. WWWW10, pages 689--696, May 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Nasukawa, M. Morohashi, and T. Nagano. Customer claim mining: Discovering knowledge in vast amounts of textual data. Technical report, IBM Research, Japan, 1999.]]Google ScholarGoogle Scholar
  21. T. Nasukawa and J. Yi. Sentiment analysis: Capturing favorability using natural language processing. In Proceedings of K-CAP '03, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Nigam and M. Hurst. Towards a robust metric of opinion. In AAAI Spring Symposium on Exploring Attitude and Affect in Text, 2004.]]Google ScholarGoogle Scholar
  23. B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? sentiment classification using machine learning techniques. In Proceedings of EMNLP 2002, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. G. Shanahan, Y. Qu, and J. Weibe, editors. Computing Attitude and Affect in Text. Springer, Dordrecht, Netherlands, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Tomokiyo and M. Hurst. A language model approach to keyphrase extraction. In Proceedings of the ACL Workshop on Multiword Expressions, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Yang. An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1/2):67--88, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Deriving marketing intelligence from online discussion

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
      August 2005
      844 pages
      ISBN:159593135X
      DOI:10.1145/1081870

      Copyright © 2005 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 August 2005

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader