ABSTRACT
Weblogs and message boards provide online forums for discussion that record the voice of the public. Woven into this mass of discussion is a wide range of opinion and commentary about consumer products. This presents an opportunity for companies to understand and respond to the consumer by analyzing this unsolicited feedback. Given the volume, format and content of the data, the appropriate approach to understand this data is to use large-scale web and text data mining technologies.This paper argues that applications for mining large volumes of textual data for marketing intelligence should provide two key elements: a suite of powerful mining and visualization technologies and an interactive analysis environment which allows for rapid generation and testing of hypotheses. This paper presents such a system that gathers and annotates online discussion relating to consumer products using a wide variety of state-of-the-art techniques, including crawling, wrapping, search, text classification and computational linguistics. Marketing intelligence is derived through an interactive analysis framework uniquely configured to leverage the connectivity and content of annotated online discussion.
- S. Abney. Partial parsing via finite-state cascades. In Workshop on Robust Parsing, 8th European Summer School in Logic, Language and Information, 1996.]]Google ScholarDigital Library
- R. Agrawal, S. Rajagopalan, R. Srikant, and Y. Xu. Mining newsgroups using networks arising from social behavior. In Proceedings of the Twelfth International World Wide Web Conference (WWW2003), 2003.]] Google ScholarDigital Library
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In J. B. Bocca, M. Jarke, and C. Zaniolo, editors, Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487--499. Morgan Kaufmann, 12--15 1994.]] Google ScholarDigital Library
- R. Baumgartner, S. Flesca, and G. Gottlob. Declarative information extraction, Web crawling, and recursive wrapping with Lixto. Lecture Notes in Computer Science, 2173, 2001.]] Google ScholarDigital Library
- K. D. Bollacker, S. Lawrence, and C. L. Giles. CiteSeer: An autonomous web agent for automatic retrieval and identification of interesting publications. In Agents '98, pages 116--123, 1998.]] Google ScholarDigital Library
- H. Chen, J. Hu, and R. W. Sproat. Integrating geometric and linguistic analysis for e-mail signature block parsing. ACM Transactions on Information Systems, 17(4):343--366, 1999.]] Google ScholarDigital Library
- W. W. Cohen. Data integration using similarity joins and a word-based information representation language. ACM Transactions on Information Systems, 18(3):288---321, 2000.]] Google ScholarDigital Library
- W. W. Cohen, L. S. Jensen, and M. Hurst. A flexible learning system for wrapping tables and lists in HTML documents. In Proceedings of The Eleventh International World Wide Web Conference (WWW-2002), Honolulu, Hawaii, 2002.]] Google ScholarDigital Library
- M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery. Learning to construct knowledge bases from the World Wide Web. Artificial Intelligence, 118(1--2):69--113, 2000.]] Google ScholarDigital Library
- N. Glance and W. Cohen. BoardViewer: Meta-search and community mapping over message boards. Intelliseek Technical Report, 2003.]]Google Scholar
- N. Glance, M. Hurst, and T. Tomokiyo. BlogPulse: Automated trend discovery for weblogs. In WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2004.]]Google Scholar
- M. Hurst and K. Nigam. Retrieving topical sentiments from online document collections. In Document Recognition and Retrieval XI, pages 27--34, 2004.]]Google Scholar
- L. S. Jensen and W. Cohen. Grouping extracted fields. In Proceedings of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, 2001.]]Google Scholar
- T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In Machine Learning: ECML-98, Tenth European Conference on Machine Learning, 1998.]] Google ScholarDigital Library
- D. D. Lewis and J. Catlett. Heterogeneous uncertainty sampling for supervised learning. In Machine Learning: Proceedings of the Eleventh International Conference, 1994.]]Google ScholarDigital Library
- D. D. Lewis and W. A. Gale. A sequential algorithm for training text classifiers. In SIGIR '94, pages 3--12, 1994.]] Google ScholarDigital Library
- N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285--318, 1988.]] Google ScholarCross Ref
- A. McCallum and K. Nigam. Employing EM in pool-based active learning for text classification. In Machine Learning: Proceedings of the Fifteenth International Conference, pages 350--358, 1998.]] Google ScholarDigital Library
- J. Myllymaki. Effective web data extraction with standard XML technologies. In Proc. WWWW10, pages 689--696, May 2001.]] Google ScholarDigital Library
- T. Nasukawa, M. Morohashi, and T. Nagano. Customer claim mining: Discovering knowledge in vast amounts of textual data. Technical report, IBM Research, Japan, 1999.]]Google Scholar
- T. Nasukawa and J. Yi. Sentiment analysis: Capturing favorability using natural language processing. In Proceedings of K-CAP '03, 2003.]] Google ScholarDigital Library
- K. Nigam and M. Hurst. Towards a robust metric of opinion. In AAAI Spring Symposium on Exploring Attitude and Affect in Text, 2004.]]Google Scholar
- B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? sentiment classification using machine learning techniques. In Proceedings of EMNLP 2002, 2002.]] Google ScholarDigital Library
- J. G. Shanahan, Y. Qu, and J. Weibe, editors. Computing Attitude and Affect in Text. Springer, Dordrecht, Netherlands, 2005.]] Google ScholarDigital Library
- T. Tomokiyo and M. Hurst. A language model approach to keyphrase extraction. In Proceedings of the ACL Workshop on Multiword Expressions, 2003.]] Google ScholarDigital Library
- Y. Yang. An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1/2):67--88, 1999.]] Google ScholarDigital Library
Index Terms
- Deriving marketing intelligence from online discussion
Recommendations
Analyzing online discussion for marketing intelligence
WWW '05: Special interest tracks and posters of the 14th international conference on World Wide WebWe present a system that gathers and analyzes online discussion as it relates to consumer products. Weblogs and online message boards provide forums that record the voice of the public. Woven into this discussion is a wide range of opinion and ...
Business intelligence in online customer textual reviews
Apply text mining and regression to analyze customer textual reviews.Identifies key attributes leading to customer satisfaction and dissatisfaction.Hotel star level significantly influences satisfaction and dissatisfaction. With the rapid development of ...
Blogger-Centric Contextual Advertising
Web advertising (online advertising), a form of advertising that uses the World Wide Web to attract customers, has become one of the most commonly-used marketing channels. This paper addresses the concept of Blogger-Centric Contextual Advertising, which ...
Comments