skip to main content
10.1145/1135777.1135857acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
Article

A probabilistic approach to spatiotemporal theme pattern mining on weblogs

Authors Info & Claims
Published:23 May 2006Publication History

ABSTRACT

Mining subtopics from weblogs and analyzing their spatiotemporal patterns have applications in multiple domains. In this paper, we define the novel problem of mining spatiotemporal theme patterns from weblogs and propose a novel probabilistic approach to model the subtopic themes and spatiotemporal theme patterns simultaneously. The proposed model discovers spatiotemporal theme patterns by (1) extracting common themes from weblogs; (2) generating theme life cycles for each given location; and (3) generating theme snapshots for each given time period. Evolution of patterns can be discovered by comparative analysis of theme life cycles and theme snapshots. Experiments on three different data sets show that the proposed approach can discover interesting spatiotemporal theme patterns effectively. The proposed probabilistic model is general and can be used for spatiotemporal text mining on any domain with time and location information.

References

  1. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003.]] Google ScholarGoogle ScholarCross RefCross Ref
  2. S. Boykin and A. Merlino. Machine learning of event segmentation for news on demand. Commun. ACM, 43(2):35--41, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. W. B. Croft and J. Lafferty, editors. Language Modeling and Information Retrieval. Kluwer Academic Publishers, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statist. Soc. B, 39:1--38, 1977.]]Google ScholarGoogle ScholarCross RefCross Ref
  5. U. Fayyad, D. Haussler, and P. Stolorz. Mining scientific data. Commun. ACM, 39(11):51--57, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Gabrilovich, S. Dumais, and E. Horvitz. Newsjunkie: providing personalized newsfeeds via analysis of information novelty. In Proceedings of the 13th international conference on World Wide Web, pages 482--490, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. E. Gill. Blogging, rss and the information landscape: A look at online news. In WWW 2005 Workshop on the Weblogging Ecosystem, 2005.]]Google ScholarGoogle Scholar
  8. N. Glance, M. Hurst, and T. Tornkiyo. Blogpulse: Automated trend discovery for weblogs. In WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2004.]]Google ScholarGoogle Scholar
  9. T. L. Gri'ths and M. Steyvers. Fiding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl.1):5228--5235, 2004.]]Google ScholarGoogle Scholar
  10. D. Gruhl, R. Guha, R. Kumar, J. Novak, and A. Tomkins. The predictive power of online chatter. In Proceeding of KDD '05, pages 78--87, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. In Proceedings of the 13th international conference on World Wide Web, pages 491--501, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of SIGIR '99, pages 50--57, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Kleinberg. Bursty and hierarchical structure in streams. In Proceedings of KDD '02, pages 91--101, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Kontostathis, L. Galitsky, W. M. Pottenger, S. Roy, and D. J. Phelps. A survey of emerging trend detection in textual data mining. Survey of Text Mining, pages 185--224, 2003.]]Google ScholarGoogle Scholar
  15. R. Krovetz. Viewing morphology as an inference process. In Proceedings of SIGIR '93, pages 191--202, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In Proceedings of the 12th international conference on World Wide Web, pages 568--576, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. Structure and evolution of blogspace. Commun. ACM, 47(12):35--39, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Z. Li, B. Wang, M. Li, and W.-Y. Ma. A probabilistic model for retrospective news event detection. In Proceedings of SIGIR '05, pages 106--113, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Ma and S. Perkins. Online novelty detection on temporal sequences. In Proceedings of KDD '03, pages 613--618, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. N. Mamoulis, H. Cao, G. Kollios, M. Hadjieleftheriou, Y. Tao, and D. W. Cheung. Mining, indexing, and querying historical spatiotemporal data. In Proceedings of KDD '04, pages 236--245, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In Proceeding of KDD '05, pages 198--207, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Morinaga and K. Yamanishi. Tracking dynamics of topic trends using finite mixture model. In Proceedings of KDD '04, pages 811--816, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. B. Neill, A. W. Moore, M. Sabhnani, and K. Daniel. Detection of emerging space-time clusters. In Proceeding of KDD '05, pages 218--227, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Perkio, W. Buntine, and S. Perttu. Exploring independent trends in a topic-based search engine. In Proceedings of WI'04, pages 664--668, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. Rajaraman and A.-H. Tan. Topic detection, tracking, and trend analysis using self-organizing neural networks. In PAKDD, pages 102--107, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Tseng, J. Tatemura, and Y. Wu. Tomographic clustering to visualize blog communities as mountain views. In WWW 2005 Workshop on the Weblogging Ecosystem, 2005.]]Google ScholarGoogle Scholar
  27. C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In Proceedings of KDD '04, pages 743--748, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A probabilistic approach to spatiotemporal theme pattern mining on weblogs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WWW '06: Proceedings of the 15th international conference on World Wide Web
      May 2006
      1102 pages
      ISBN:1595933239
      DOI:10.1145/1135777

      Copyright © 2006 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 May 2006

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

      Upcoming Conference

      WWW '24
      The ACM Web Conference 2024
      May 13 - 17, 2024
      Singapore , Singapore

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader