skip to main content
10.1145/1835804.1835922acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

PET: a statistical model for popular events tracking in social communities

Authors Info & Claims
Published:25 July 2010Publication History

ABSTRACT

User generated information in online communities has been characterized with the mixture of a text stream and a network structure both changing over time. A good example is a web-blogging community with the daily blog posts and a social network of bloggers.

An important task of analyzing an online community is to observe and track the popular events, or topics that evolve over time in the community. Existing approaches usually focus on either the burstiness of topics or the evolution of networks, but ignoring the interplay between textual topics and network structures.

In this paper, we formally define the problem of popular event tracking in online communities (PET), focusing on the interplay between texts and networks. We propose a novel statistical method that models the the popularity of events over time, taking into consideration the burstiness of user interest, information diffusion on the network structure, and the evolution of textual topics. Specifically, a Gibbs Random Field is defined to model the influence of historic status and the dependency relationships in the graph; thereafter a topic model generates the words in text content of the event, regularized by the Gibbs Random Field. We prove that two classic models in information diffusion and text burstiness are special cases of our model under certain situations. Empirical experiments with two different communities and datasets (i.e., Twitter and DBLP) show that our approach is effective and outperforms existing approaches.

Skip Supplemental Material Section

Supplemental Material

kdd2010_xide_lin_pet_01.mov

mov

138.4 MB

References

  1. L. A. Adamic and E. Adar. Friends and neighbors on the web. Social Networks, 25(3):211--230, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  2. L. Araujo, J. A. Cuesta, and J. J. M. Guervós. Genetic algorithm for burst detection and activity tracking in event streams. In PPSN, pages 302--311, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 44--54, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003. Google ScholarGoogle ScholarCross RefCross Ref
  5. J. J. Brown and P. H. Reingen. Social ties and word-of-mouth referral behavior. Journal of Consumer Research: An Interdisciplinary Quarterly, 14(3):350--362, 1987.Google ScholarGoogle ScholarCross RefCross Ref
  6. C. Chatfield. The analysis of time series. In Chapman and Hall, 1984.Google ScholarGoogle Scholar
  7. L. Chen and A. Roy. Event detection from flickr data through wavelet-based spatial analysis. In CIKM, pages 523--532, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. W. Church and W. A. Gale. Poisson mixtures. Natural Language Engineering, 1:163--190, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  9. T. L. Fond and J. Neville. Randomization tests for distinguishing social influence and homophily effects. In WWW, pages 601--610, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. P. C. Fung, J. X. Yu, P. S. Yu, and H. Lu. Parameter free bursty events detection in text streams. In VLDB, pages 181--192, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. In WWW '04: Proceedings of the 13th international conference on World Wide Web, pages 491--501, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Hofmann. Probabilistic latent smantic analysis. In UAI, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. T. Ihler, J. Hutchins, and P. Smyth. Adaptive event detection with time-varying poisson processes. In KDD, pages 207--216, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. M. Kleinberg. Bursty and hierarchical structure in streams. In KDD, pages 91--101, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Lappas, B. Arai, M. Platakis, D. Kotsakos, and D. Gunopulos. On burstiness-aware search for document sequences. In KDD, pages 477--486, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Leskovec, L. A. Adamic, and B. A. Huberman. The dynamics of viral marketing. In EC '06: Proceedings of the 7th ACM conference on Electronic commerce, pages 228--237, 2006 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Z. Li. Markov random field modeling in image analysis. In Springer-Verlag New York, Inc., 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. McLachlan and T. Krishnan. The em algorithm and extensions. Wiley series in probability and statistics, Hoboken, NJ, 2008. Wiley.Google ScholarGoogle Scholar
  19. Q. Mei, D. Cai, D. Zhang, and C. Zhai. Topic modeling with network regularization. In WWW, pages 101--110, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In KDD, pages 198--207, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Morris. Contagion. In Review of Economic Studies, pages 57--78, 2000.Google ScholarGoogle Scholar
  22. R. Nickalls. A new approach to solving the cubic: Cardan's solution revealed. In The Mathematical Gazette, page 354--359, 1993.Google ScholarGoogle Scholar
  23. N. Parikh and N. Sundaresan. Scalable and near real-time burst detection from ecommerce queries. In KDD, pages 972--980, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Preston, P. Protopapas, and C. E. Brodley. Event discovery in time series. In SDM, pages 61--72, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  25. Y. Sun, J. Han, J. Gao, and Y. Yu. itopicmodel: Information network-integrated topic modeling. In ICDM, pages 493--502, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. X. Wang, C. Zhai, X. Hu, and R. Sproat. Mining correlated bursty topic patterns from coordinated text streams. In KDD, pages 784--793, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Zhai and J. D. Lafferty. Model-based feedback in the kl-divergence retrieval model. In CIKM, pages 403--410, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In KDD, pages 743--748, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, pages 1501--1506, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Zhou, X. Ji, H. Zha, and C. L. Giles. Topic evolution and social interactions: how authors effect research. In CIKM, pages 248--257, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, pages 912--919, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y. Zhu and D. Shasha. Efficient elastic burst detection in data streams. In KDD, pages 336--345, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PET: a statistical model for popular events tracking in social communities

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
      July 2010
      1240 pages
      ISBN:9781450300551
      DOI:10.1145/1835804

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 July 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader