Article

A probabilistic approach to spatiotemporal theme pattern mining on weblogs

Authors:
Qiaozhu Mei

University of Illinois at Urbana Champaign, Urbana, IL

University of Illinois at Urbana Champaign, Urbana, IL
View Profile

,
Chao Liu

University of Illinois at Urbana Champaign, Urbana, IL

University of Illinois at Urbana Champaign, Urbana, IL
View Profile

,
Hang Su

Vanderbilt University, Nashville, TN

Vanderbilt University, Nashville, TN
View Profile

,
ChengXiang Zhai

University of Illinois at Urbana Champaign, Urbana, IL

University of Illinois at Urbana Champaign, Urbana, IL
View Profile

WWW '06: Proceedings of the 15th international conference on World Wide WebMay 2006Pages 533–542https://doi.org/10.1145/1135777.1135857

Published:23 May 2006Publication History

WWW '06: Proceedings of the 15th international conference on World Wide Web

Pages 533–542

ABSTRACT

Mining subtopics from weblogs and analyzing their spatiotemporal patterns have applications in multiple domains. In this paper, we define the novel problem of mining spatiotemporal theme patterns from weblogs and propose a novel probabilistic approach to model the subtopic themes and spatiotemporal theme patterns simultaneously. The proposed model discovers spatiotemporal theme patterns by (1) extracting common themes from weblogs; (2) generating theme life cycles for each given location; and (3) generating theme snapshots for each given time period. Evolution of patterns can be discovered by comparative analysis of theme life cycles and theme snapshots. Experiments on three different data sets show that the proposed approach can discover interesting spatiotemporal theme patterns effectively. The proposed probabilistic model is general and can be used for spatiotemporal text mining on any domain with time and location information.

References

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003.]] Google ScholarCross Ref
S. Boykin and A. Merlino. Machine learning of event segmentation for news on demand. Commun. ACM, 43(2):35--41, 2000.]] Google ScholarDigital Library
W. B. Croft and J. Lafferty, editors. Language Modeling and Information Retrieval. Kluwer Academic Publishers, 2003.]] Google ScholarDigital Library
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statist. Soc. B, 39:1--38, 1977.]]Google ScholarCross Ref
U. Fayyad, D. Haussler, and P. Stolorz. Mining scientific data. Commun. ACM, 39(11):51--57, 1996.]] Google ScholarDigital Library
E. Gabrilovich, S. Dumais, and E. Horvitz. Newsjunkie: providing personalized newsfeeds via analysis of information novelty. In Proceedings of the 13th international conference on World Wide Web, pages 482--490, 2004.]] Google ScholarDigital Library
K. E. Gill. Blogging, rss and the information landscape: A look at online news. In WWW 2005 Workshop on the Weblogging Ecosystem, 2005.]]Google Scholar
N. Glance, M. Hurst, and T. Tornkiyo. Blogpulse: Automated trend discovery for weblogs. In WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2004.]]Google Scholar
T. L. Gri'ths and M. Steyvers. Fiding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl.1):5228--5235, 2004.]]Google Scholar
D. Gruhl, R. Guha, R. Kumar, J. Novak, and A. Tomkins. The predictive power of online chatter. In Proceeding of KDD '05, pages 78--87, 2005.]] Google ScholarDigital Library
D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. In Proceedings of the 13th international conference on World Wide Web, pages 491--501, 2004.]] Google ScholarDigital Library
T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of SIGIR '99, pages 50--57, 1999.]] Google ScholarDigital Library
J. Kleinberg. Bursty and hierarchical structure in streams. In Proceedings of KDD '02, pages 91--101, 2002.]] Google ScholarDigital Library
A. Kontostathis, L. Galitsky, W. M. Pottenger, S. Roy, and D. J. Phelps. A survey of emerging trend detection in textual data mining. Survey of Text Mining, pages 185--224, 2003.]]Google Scholar
R. Krovetz. Viewing morphology as an inference process. In Proceedings of SIGIR '93, pages 191--202, 1993.]] Google ScholarDigital Library
R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In Proceedings of the 12th international conference on World Wide Web, pages 568--576, 2003.]] Google ScholarDigital Library
R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. Structure and evolution of blogspace. Commun. ACM, 47(12):35--39, 2004.]] Google ScholarDigital Library
Z. Li, B. Wang, M. Li, and W.-Y. Ma. A probabilistic model for retrospective news event detection. In Proceedings of SIGIR '05, pages 106--113, 2005.]] Google ScholarDigital Library
J. Ma and S. Perkins. Online novelty detection on temporal sequences. In Proceedings of KDD '03, pages 613--618, 2003.]] Google ScholarDigital Library
N. Mamoulis, H. Cao, G. Kollios, M. Hadjieleftheriou, Y. Tao, and D. W. Cheung. Mining, indexing, and querying historical spatiotemporal data. In Proceedings of KDD '04, pages 236--245, 2004.]] Google ScholarDigital Library
Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In Proceeding of KDD '05, pages 198--207, 2005.]] Google ScholarDigital Library
S. Morinaga and K. Yamanishi. Tracking dynamics of topic trends using finite mixture model. In Proceedings of KDD '04, pages 811--816, 2004.]] Google ScholarDigital Library
D. B. Neill, A. W. Moore, M. Sabhnani, and K. Daniel. Detection of emerging space-time clusters. In Proceeding of KDD '05, pages 218--227, 2005.]] Google ScholarDigital Library
J. Perkio, W. Buntine, and S. Perttu. Exploring independent trends in a topic-based search engine. In Proceedings of WI'04, pages 664--668, 2004.]] Google ScholarDigital Library
K. Rajaraman and A.-H. Tan. Topic detection, tracking, and trend analysis using self-organizing neural networks. In PAKDD, pages 102--107, 2001.]] Google ScholarDigital Library
B. Tseng, J. Tatemura, and Y. Wu. Tomographic clustering to visualize blog communities as mountain views. In WWW 2005 Workshop on the Weblogging Ecosystem, 2005.]]Google Scholar
C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In Proceedings of KDD '04, pages 743--748, 2004.]] Google ScholarDigital Library

Index Terms

A probabilistic approach to spatiotemporal theme pattern mining on weblogs
1. Information systems
  1. Information retrieval

Recommendations

A mixture model for contextual text mining
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Contextual text mining is concerned with extracting topical themes from a text collection with context information (e.g., time and location) and comparing/analyzing the variations of themes over different contexts. Since the topics covered in a document ...
Read More
User Behaviour Pattern Mining from Weblog

In this paper, the authors build a tree using both frequent as well as non-frequent items and named as Revised PLWAP with Non-frequent Items RePLNI-tree in single scan. While mining sequential patterns, the links related to the non-frequent items are ...
Read More
A tag-topic model for blog mining

Blog mining addresses the problem of mining information from blog data. Although mining blogs may share many similarities to Web and text documents, existing techniques need to be reevaluated and adapted for the multidimensional representation of blog ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '06: Proceedings of the 15th international conference on World Wide Web
May 2006
1102 pages
ISBN:1595933239
DOI:10.1145/1135777
General Chairs:
Leslie Carr
University of Southampton
,
David De Roure
University of Southampton
,
Arun Iyengar
IBM Research
,
Program Chairs:
Carole Goble
University of Manchester, UK
,
Mike Dahlin
University of Texas at Austin
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 May 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
mixture model
spatiotemporal text mining
theme pattern
weblog
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Upcoming Conference
WWW '24

Sponsor:

sigweb

The ACM Web Conference 2024

May 13 - 17, 2024

Singapore , Singapore
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 243
  Total Citations
  View Citations
- 2,122
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A probabilistic approach to spatiotemporal theme pattern mining on weblogs

WWW '06: Proceedings of the 15th international conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

A mixture model for contextual text mining

User Behaviour Pattern Mining from Weblog

A tag-topic model for blog mining