skip to main content
10.1145/1183614.1183627acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Mining blog stories using community-based and temporal clustering

Published:06 November 2006Publication History

ABSTRACT

In recent years, weblogs, or blogs for short, have become an important form of online content. The personal nature of blogs, online interactions between bloggers, and the temporal nature of blog entries, differentiate blogs from other kinds of Web content. Bloggers interact with each other by linking to each other's posts, thus forming online communities. Within these communities, bloggers engage in discussions of certain issues, through entries in their blogs. Since these discussions are often initiated in response to online or offline events, a discussion typically lasts for a limited time duration. We wish to extract such temporal discussions, or stories, occurring within blogger communities, based on some query keywords. We propose a Content-Community-Time model that can leverage the content of entries, their timestamps, and the community structure of the blogs, to automatically discover stories. Doing so also allows us to discover hot stories. We demonstrate the effectiveness of our model through several case studies using real-world data collected from the blogosphere.

References

  1. Lada A. Adamic and Natalie Glance. The political blogosphere and the 2004 u.s. election: Divided they blog. Proceedings of KDD Workshop on Link Analysis and Group Detection LinkKDD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. Adar and L. A. Adamic. Tracking information epidemics in blogspace. In Web Intelligence, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. Journal on Machine Learning Research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Blogger. www.blogger.com.Google ScholarGoogle Scholar
  5. Blogpulse. www.blogpulse.com.Google ScholarGoogle Scholar
  6. Douglass Cutting, David Karger, Jan Pedersen, and John W. Tukey. Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of 15th Annual International ACM SIGIR Conference on Information Retrieval, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Natalie Glance, Matthew Hurst, Kamal Nigam, Matthew Siegler, Robert Stockton, and Takashi Tomokiyo. Deriving market intelligence from online discussion. In ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Gruhl, R. V. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. SIGKDD Explorations, 6(2):43--52, December 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Hoffman. Probabalistic latent semantic analysis. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 1999.Google ScholarGoogle Scholar
  10. iBoogie. www.iboogie.com.Google ScholarGoogle Scholar
  11. K. Ishida. Extracting latent weblog communities: A partitioning algorithm for bipartite graphs. In Proceedings of 2nd Annual Workshop on the Weblogging Ecosystem, 2005.Google ScholarGoogle Scholar
  12. X. Jhu, Z. Ghahramani, and J. Lafferty. Time-sensitive dirichlet process mixture models. Technical Report, CMU-CALD-05-104, 2005.Google ScholarGoogle Scholar
  13. C. Kemp, T. L. Griffiths, and J. Tenenbaum. Discovering latent classes in relational data. Technical Report, MIT CSAIL, 2004.Google ScholarGoogle Scholar
  14. Jon Kleinberg. Bursty and heirarchical structure in streams. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In Proceedings of the 12th International Conference on World Wide Web (WWW), pages 568--576, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ravi Kumar, Uma Mahadevan, and D. Sivakumar. A graph-theoretic approach to extract storylines from search results. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. law, O. Jerzy, and S. Dawid. Lingo: Search results clustering algorithm based on singular value decomposition, 2004.Google ScholarGoogle Scholar
  18. LiveJournal. www.livejournal.com.Google ScholarGoogle Scholar
  19. Apache Lucene. lucene.apache.org.Google ScholarGoogle Scholar
  20. M. Steyvers M. R.-Zvi, T. Griffiths and P. Smyth. The author-topic model for authors and documents. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), volume 21, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Ma and S. Perkins. Online novelty detection on temporal sequences. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. McCallum, A. Corrada-Emmanuel, and X. Wang. The author-recipient-topic model for topic and role discovery in social networks: Experiments with enron and academic email. Technical Report UM-CS-2004-096, 2004.Google ScholarGoogle Scholar
  23. Qiaozhu Mei and ChengXiang Zhai. Discovering evolutionary theme patterns from text - an exploration of temporal text mining. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. Nowicki and T. A. Snijders. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  25. Google Blog Search. blogsearch.google.com.Google ScholarGoogle Scholar
  26. Xiaodan Song, Ching-Yung Lin, Belle L. Tseng, and Ming-Ting Sun. Modeling and predicting personal information dissemination behavior. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Technorati. www.technorati.com.Google ScholarGoogle Scholar
  28. B. L. Tseng, J. Tatemura, and Y. Wu. Tomographic clustering to visualize blog communities as mountain views. In Proceedings of 2nd Annual Workshop on the Weblogging Ecosystem, 2005.Google ScholarGoogle Scholar
  29. Vivisimo. www.vivisimo.com.Google ScholarGoogle Scholar
  30. X. Wang, N. Mohanty, and A. McCallum. Group and topic discovery from relations and text. In Proceedings of KDD Workshop on Link Analysis and Group Detection (LinkKDD), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Oren Zamir and Oren Etzioni. Grouper: a dynamic clustering interface to Web search results. Computer Networks (Amsterdam, Netherlands: 1999), 31(11--16):1361--1374, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. Zeng, Q. He, Z. Chen, W. Ma, and J. Ma. Learning to cluster web search results. In Proceedings of 27th Annual ACM SIGIR, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mining blog stories using community-based and temporal clustering

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management
      November 2006
      916 pages
      ISBN:1595934332
      DOI:10.1145/1183614

      Copyright © 2006 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 November 2006

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader