ABSTRACT
In recent years, weblogs, or blogs for short, have become an important form of online content. The personal nature of blogs, online interactions between bloggers, and the temporal nature of blog entries, differentiate blogs from other kinds of Web content. Bloggers interact with each other by linking to each other's posts, thus forming online communities. Within these communities, bloggers engage in discussions of certain issues, through entries in their blogs. Since these discussions are often initiated in response to online or offline events, a discussion typically lasts for a limited time duration. We wish to extract such temporal discussions, or stories, occurring within blogger communities, based on some query keywords. We propose a Content-Community-Time model that can leverage the content of entries, their timestamps, and the community structure of the blogs, to automatically discover stories. Doing so also allows us to discover hot stories. We demonstrate the effectiveness of our model through several case studies using real-world data collected from the blogosphere.
- Lada A. Adamic and Natalie Glance. The political blogosphere and the 2004 u.s. election: Divided they blog. Proceedings of KDD Workshop on Link Analysis and Group Detection LinkKDD, 2005. Google ScholarDigital Library
- E. Adar and L. A. Adamic. Tracking information epidemics in blogspace. In Web Intelligence, 2005. Google ScholarDigital Library
- D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. Journal on Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
- Blogger. www.blogger.com.Google Scholar
- Blogpulse. www.blogpulse.com.Google Scholar
- Douglass Cutting, David Karger, Jan Pedersen, and John W. Tukey. Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of 15th Annual International ACM SIGIR Conference on Information Retrieval, 1992. Google ScholarDigital Library
- Natalie Glance, Matthew Hurst, Kamal Nigam, Matthew Siegler, Robert Stockton, and Takashi Tomokiyo. Deriving market intelligence from online discussion. In ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining, 2005. Google ScholarDigital Library
- D. Gruhl, R. V. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. SIGKDD Explorations, 6(2):43--52, December 2004. Google ScholarDigital Library
- T. Hoffman. Probabalistic latent semantic analysis. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 1999.Google Scholar
- iBoogie. www.iboogie.com.Google Scholar
- K. Ishida. Extracting latent weblog communities: A partitioning algorithm for bipartite graphs. In Proceedings of 2nd Annual Workshop on the Weblogging Ecosystem, 2005.Google Scholar
- X. Jhu, Z. Ghahramani, and J. Lafferty. Time-sensitive dirichlet process mixture models. Technical Report, CMU-CALD-05-104, 2005.Google Scholar
- C. Kemp, T. L. Griffiths, and J. Tenenbaum. Discovering latent classes in relational data. Technical Report, MIT CSAIL, 2004.Google Scholar
- Jon Kleinberg. Bursty and heirarchical structure in streams. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002. Google ScholarDigital Library
- R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In Proceedings of the 12th International Conference on World Wide Web (WWW), pages 568--576, 2003. Google ScholarDigital Library
- Ravi Kumar, Uma Mahadevan, and D. Sivakumar. A graph-theoretic approach to extract storylines from search results. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005. Google ScholarDigital Library
- S. law, O. Jerzy, and S. Dawid. Lingo: Search results clustering algorithm based on singular value decomposition, 2004.Google Scholar
- LiveJournal. www.livejournal.com.Google Scholar
- Apache Lucene. lucene.apache.org.Google Scholar
- M. Steyvers M. R.-Zvi, T. Griffiths and P. Smyth. The author-topic model for authors and documents. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), volume 21, 2004. Google ScholarDigital Library
- J. Ma and S. Perkins. Online novelty detection on temporal sequences. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003. Google ScholarDigital Library
- A. McCallum, A. Corrada-Emmanuel, and X. Wang. The author-recipient-topic model for topic and role discovery in social networks: Experiments with enron and academic email. Technical Report UM-CS-2004-096, 2004.Google Scholar
- Qiaozhu Mei and ChengXiang Zhai. Discovering evolutionary theme patterns from text - an exploration of temporal text mining. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005. Google ScholarDigital Library
- K. Nowicki and T. A. Snijders. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 2001.Google ScholarCross Ref
- Google Blog Search. blogsearch.google.com.Google Scholar
- Xiaodan Song, Ching-Yung Lin, Belle L. Tseng, and Ming-Ting Sun. Modeling and predicting personal information dissemination behavior. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005. Google ScholarDigital Library
- Technorati. www.technorati.com.Google Scholar
- B. L. Tseng, J. Tatemura, and Y. Wu. Tomographic clustering to visualize blog communities as mountain views. In Proceedings of 2nd Annual Workshop on the Weblogging Ecosystem, 2005.Google Scholar
- Vivisimo. www.vivisimo.com.Google Scholar
- X. Wang, N. Mohanty, and A. McCallum. Group and topic discovery from relations and text. In Proceedings of KDD Workshop on Link Analysis and Group Detection (LinkKDD), 2005. Google ScholarDigital Library
- Oren Zamir and Oren Etzioni. Grouper: a dynamic clustering interface to Web search results. Computer Networks (Amsterdam, Netherlands: 1999), 31(11--16):1361--1374, 1999. Google ScholarDigital Library
- H. Zeng, Q. He, Z. Chen, W. Ma, and J. Ma. Learning to cluster web search results. In Proceedings of 27th Annual ACM SIGIR, 2004. Google ScholarDigital Library
Index Terms
- Mining blog stories using community-based and temporal clustering
Recommendations
Organization and Tagging of Blog and News Entries Based on Content Reuse
As their popularity as dynamic platforms for information dissemination and sharing increases, the use of Weblogs (blogs) which track and comment on real world (political, news, entertainment) events is also growing. The success of the blog as a popular ...
Blog Community Discovery Based on Tag Data Clustering
PACIIA '08: Proceedings of the 2008 IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application - Volume 02Blog is increasingly becoming an important source of information. Blog community is a kind of a group of bloggers with the same interest and common topics on the Internet. To use blog resources effectively, one important way is to identify blog ...
Subject-based extraction of a latent blog community
In the blogosphere, there exist posts relevant to a particular subject and blogs that show interest in the subject. In this paper, we define a set of such posts and blogs as a blog community and propose a method for extracting the blog community ...
Comments