Article

Mining blog stories using community-based and temporal clustering

Authors:
Arun Qamra

UC Santa Barbara

UC Santa Barbara
View Profile

,
Belle Tseng

NEC Labs America, Cupertino

NEC Labs America, Cupertino
View Profile

,
Edward Y. Chang

UC Santa Barbara

UC Santa Barbara
View Profile

CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementNovember 2006Pages 58–67https://doi.org/10.1145/1183614.1183627

Published:06 November 2006Publication History

CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

Pages 58–67

ABSTRACT

In recent years, weblogs, or blogs for short, have become an important form of online content. The personal nature of blogs, online interactions between bloggers, and the temporal nature of blog entries, differentiate blogs from other kinds of Web content. Bloggers interact with each other by linking to each other's posts, thus forming online communities. Within these communities, bloggers engage in discussions of certain issues, through entries in their blogs. Since these discussions are often initiated in response to online or offline events, a discussion typically lasts for a limited time duration. We wish to extract such temporal discussions, or stories, occurring within blogger communities, based on some query keywords. We propose a Content-Community-Time model that can leverage the content of entries, their timestamps, and the community structure of the blogs, to automatically discover stories. Doing so also allows us to discover hot stories. We demonstrate the effectiveness of our model through several case studies using real-world data collected from the blogosphere.

References

Lada A. Adamic and Natalie Glance. The political blogosphere and the 2004 u.s. election: Divided they blog. Proceedings of KDD Workshop on Link Analysis and Group Detection LinkKDD, 2005. Google ScholarDigital Library
E. Adar and L. A. Adamic. Tracking information epidemics in blogspace. In Web Intelligence, 2005. Google ScholarDigital Library
D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. Journal on Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
Blogger. www.blogger.com.Google Scholar
Blogpulse. www.blogpulse.com.Google Scholar
Douglass Cutting, David Karger, Jan Pedersen, and John W. Tukey. Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of 15th Annual International ACM SIGIR Conference on Information Retrieval, 1992. Google ScholarDigital Library
Natalie Glance, Matthew Hurst, Kamal Nigam, Matthew Siegler, Robert Stockton, and Takashi Tomokiyo. Deriving market intelligence from online discussion. In ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining, 2005. Google ScholarDigital Library
D. Gruhl, R. V. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. SIGKDD Explorations, 6(2):43--52, December 2004. Google ScholarDigital Library
T. Hoffman. Probabalistic latent semantic analysis. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 1999.Google Scholar
iBoogie. www.iboogie.com.Google Scholar
K. Ishida. Extracting latent weblog communities: A partitioning algorithm for bipartite graphs. In Proceedings of 2nd Annual Workshop on the Weblogging Ecosystem, 2005.Google Scholar
X. Jhu, Z. Ghahramani, and J. Lafferty. Time-sensitive dirichlet process mixture models. Technical Report, CMU-CALD-05-104, 2005.Google Scholar
C. Kemp, T. L. Griffiths, and J. Tenenbaum. Discovering latent classes in relational data. Technical Report, MIT CSAIL, 2004.Google Scholar
Jon Kleinberg. Bursty and heirarchical structure in streams. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002. Google ScholarDigital Library
R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In Proceedings of the 12th International Conference on World Wide Web (WWW), pages 568--576, 2003. Google ScholarDigital Library
Ravi Kumar, Uma Mahadevan, and D. Sivakumar. A graph-theoretic approach to extract storylines from search results. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005. Google ScholarDigital Library
S. law, O. Jerzy, and S. Dawid. Lingo: Search results clustering algorithm based on singular value decomposition, 2004.Google Scholar
LiveJournal. www.livejournal.com.Google Scholar
Apache Lucene. lucene.apache.org.Google Scholar
M. Steyvers M. R.-Zvi, T. Griffiths and P. Smyth. The author-topic model for authors and documents. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), volume 21, 2004. Google ScholarDigital Library
J. Ma and S. Perkins. Online novelty detection on temporal sequences. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003. Google ScholarDigital Library
A. McCallum, A. Corrada-Emmanuel, and X. Wang. The author-recipient-topic model for topic and role discovery in social networks: Experiments with enron and academic email. Technical Report UM-CS-2004-096, 2004.Google Scholar
Qiaozhu Mei and ChengXiang Zhai. Discovering evolutionary theme patterns from text - an exploration of temporal text mining. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005. Google ScholarDigital Library
K. Nowicki and T. A. Snijders. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 2001.Google ScholarCross Ref
Google Blog Search. blogsearch.google.com.Google Scholar
Xiaodan Song, Ching-Yung Lin, Belle L. Tseng, and Ming-Ting Sun. Modeling and predicting personal information dissemination behavior. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005. Google ScholarDigital Library
Technorati. www.technorati.com.Google Scholar
B. L. Tseng, J. Tatemura, and Y. Wu. Tomographic clustering to visualize blog communities as mountain views. In Proceedings of 2nd Annual Workshop on the Weblogging Ecosystem, 2005.Google Scholar
Vivisimo. www.vivisimo.com.Google Scholar
X. Wang, N. Mohanty, and A. McCallum. Group and topic discovery from relations and text. In Proceedings of KDD Workshop on Link Analysis and Group Detection (LinkKDD), 2005. Google ScholarDigital Library
Oren Zamir and Oren Etzioni. Grouper: a dynamic clustering interface to Web search results. Computer Networks (Amsterdam, Netherlands: 1999), 31(11--16):1361--1374, 1999. Google ScholarDigital Library
H. Zeng, Q. He, Z. Chen, W. Ma, and J. Ma. Learning to cluster web search results. In Proceedings of 27th Annual ACM SIGIR, 2004. Google ScholarDigital Library

Index Terms

Mining blog stories using community-based and temporal clustering
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Organization and Tagging of Blog and News Entries Based on Content Reuse

As their popularity as dynamic platforms for information dissemination and sharing increases, the use of Weblogs (blogs) which track and comment on real world (political, news, entertainment) events is also growing. The success of the blog as a popular ...
Read More
Blog Community Discovery Based on Tag Data Clustering
PACIIA '08: Proceedings of the 2008 IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application - Volume 02

Blog is increasingly becoming an important source of information. Blog community is a kind of a group of bloggers with the same interest and common topics on the Internet. To use blog resources effectively, one important way is to identify blog ...
Read More
Subject-based extraction of a latent blog community

In the blogosphere, there exist posts relevant to a particular subject and blogs that show interest in the subject. In this paper, we define a set of such posts and blogs as a blog community and propose a method for extracting the blog community ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management
November 2006
916 pages
ISBN:1595934332
DOI:10.1145/1183614
General Chair:
Philip S. Yu
IBM T.J. Watson Research Center (USA)
,
Program Chairs:
Vassilis Tsotras
University of California-Riverside (USA)
,
Edward Fox
Virginia Tech (USA)
,
Bing Liu
University of Illinois at Chicago (USA)
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 November 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
online-communities
time-sensitive clustering
weblogs
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 63
  Total Citations
  View Citations
- 2,403
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mining blog stories using community-based and temporal clustering

CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Organization and Tagging of Blog and News Entries Based on Content Reuse

Blog Community Discovery Based on Tag Data Clustering

Subject-based extraction of a latent blog community