ABSTRACT
Mining subtopics from weblogs and analyzing their spatiotemporal patterns have applications in multiple domains. In this paper, we define the novel problem of mining spatiotemporal theme patterns from weblogs and propose a novel probabilistic approach to model the subtopic themes and spatiotemporal theme patterns simultaneously. The proposed model discovers spatiotemporal theme patterns by (1) extracting common themes from weblogs; (2) generating theme life cycles for each given location; and (3) generating theme snapshots for each given time period. Evolution of patterns can be discovered by comparative analysis of theme life cycles and theme snapshots. Experiments on three different data sets show that the proposed approach can discover interesting spatiotemporal theme patterns effectively. The proposed probabilistic model is general and can be used for spatiotemporal text mining on any domain with time and location information.
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003.]] Google ScholarCross Ref
- S. Boykin and A. Merlino. Machine learning of event segmentation for news on demand. Commun. ACM, 43(2):35--41, 2000.]] Google ScholarDigital Library
- W. B. Croft and J. Lafferty, editors. Language Modeling and Information Retrieval. Kluwer Academic Publishers, 2003.]] Google ScholarDigital Library
- A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statist. Soc. B, 39:1--38, 1977.]]Google ScholarCross Ref
- U. Fayyad, D. Haussler, and P. Stolorz. Mining scientific data. Commun. ACM, 39(11):51--57, 1996.]] Google ScholarDigital Library
- E. Gabrilovich, S. Dumais, and E. Horvitz. Newsjunkie: providing personalized newsfeeds via analysis of information novelty. In Proceedings of the 13th international conference on World Wide Web, pages 482--490, 2004.]] Google ScholarDigital Library
- K. E. Gill. Blogging, rss and the information landscape: A look at online news. In WWW 2005 Workshop on the Weblogging Ecosystem, 2005.]]Google Scholar
- N. Glance, M. Hurst, and T. Tornkiyo. Blogpulse: Automated trend discovery for weblogs. In WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2004.]]Google Scholar
- T. L. Gri'ths and M. Steyvers. Fiding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl.1):5228--5235, 2004.]]Google Scholar
- D. Gruhl, R. Guha, R. Kumar, J. Novak, and A. Tomkins. The predictive power of online chatter. In Proceeding of KDD '05, pages 78--87, 2005.]] Google ScholarDigital Library
- D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. In Proceedings of the 13th international conference on World Wide Web, pages 491--501, 2004.]] Google ScholarDigital Library
- T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of SIGIR '99, pages 50--57, 1999.]] Google ScholarDigital Library
- J. Kleinberg. Bursty and hierarchical structure in streams. In Proceedings of KDD '02, pages 91--101, 2002.]] Google ScholarDigital Library
- A. Kontostathis, L. Galitsky, W. M. Pottenger, S. Roy, and D. J. Phelps. A survey of emerging trend detection in textual data mining. Survey of Text Mining, pages 185--224, 2003.]]Google Scholar
- R. Krovetz. Viewing morphology as an inference process. In Proceedings of SIGIR '93, pages 191--202, 1993.]] Google ScholarDigital Library
- R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In Proceedings of the 12th international conference on World Wide Web, pages 568--576, 2003.]] Google ScholarDigital Library
- R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. Structure and evolution of blogspace. Commun. ACM, 47(12):35--39, 2004.]] Google ScholarDigital Library
- Z. Li, B. Wang, M. Li, and W.-Y. Ma. A probabilistic model for retrospective news event detection. In Proceedings of SIGIR '05, pages 106--113, 2005.]] Google ScholarDigital Library
- J. Ma and S. Perkins. Online novelty detection on temporal sequences. In Proceedings of KDD '03, pages 613--618, 2003.]] Google ScholarDigital Library
- N. Mamoulis, H. Cao, G. Kollios, M. Hadjieleftheriou, Y. Tao, and D. W. Cheung. Mining, indexing, and querying historical spatiotemporal data. In Proceedings of KDD '04, pages 236--245, 2004.]] Google ScholarDigital Library
- Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In Proceeding of KDD '05, pages 198--207, 2005.]] Google ScholarDigital Library
- S. Morinaga and K. Yamanishi. Tracking dynamics of topic trends using finite mixture model. In Proceedings of KDD '04, pages 811--816, 2004.]] Google ScholarDigital Library
- D. B. Neill, A. W. Moore, M. Sabhnani, and K. Daniel. Detection of emerging space-time clusters. In Proceeding of KDD '05, pages 218--227, 2005.]] Google ScholarDigital Library
- J. Perkio, W. Buntine, and S. Perttu. Exploring independent trends in a topic-based search engine. In Proceedings of WI'04, pages 664--668, 2004.]] Google ScholarDigital Library
- K. Rajaraman and A.-H. Tan. Topic detection, tracking, and trend analysis using self-organizing neural networks. In PAKDD, pages 102--107, 2001.]] Google ScholarDigital Library
- B. Tseng, J. Tatemura, and Y. Wu. Tomographic clustering to visualize blog communities as mountain views. In WWW 2005 Workshop on the Weblogging Ecosystem, 2005.]]Google Scholar
- C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In Proceedings of KDD '04, pages 743--748, 2004.]] Google ScholarDigital Library
Index Terms
- A probabilistic approach to spatiotemporal theme pattern mining on weblogs
Recommendations
A mixture model for contextual text mining
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data miningContextual text mining is concerned with extracting topical themes from a text collection with context information (e.g., time and location) and comparing/analyzing the variations of themes over different contexts. Since the topics covered in a document ...
User Behaviour Pattern Mining from Weblog
In this paper, the authors build a tree using both frequent as well as non-frequent items and named as Revised PLWAP with Non-frequent Items RePLNI-tree in single scan. While mining sequential patterns, the links related to the non-frequent items are ...
A tag-topic model for blog mining
Blog mining addresses the problem of mining information from blog data. Although mining blogs may share many similarities to Web and text documents, existing techniques need to be reevaluated and adapted for the multidimensional representation of blog ...
Comments