ABSTRACT
User generated information in online communities has been characterized with the mixture of a text stream and a network structure both changing over time. A good example is a web-blogging community with the daily blog posts and a social network of bloggers.
An important task of analyzing an online community is to observe and track the popular events, or topics that evolve over time in the community. Existing approaches usually focus on either the burstiness of topics or the evolution of networks, but ignoring the interplay between textual topics and network structures.
In this paper, we formally define the problem of popular event tracking in online communities (PET), focusing on the interplay between texts and networks. We propose a novel statistical method that models the the popularity of events over time, taking into consideration the burstiness of user interest, information diffusion on the network structure, and the evolution of textual topics. Specifically, a Gibbs Random Field is defined to model the influence of historic status and the dependency relationships in the graph; thereafter a topic model generates the words in text content of the event, regularized by the Gibbs Random Field. We prove that two classic models in information diffusion and text burstiness are special cases of our model under certain situations. Empirical experiments with two different communities and datasets (i.e., Twitter and DBLP) show that our approach is effective and outperforms existing approaches.
Supplemental Material
- L. A. Adamic and E. Adar. Friends and neighbors on the web. Social Networks, 25(3):211--230, 2003.Google ScholarCross Ref
- L. Araujo, J. A. Cuesta, and J. J. M. Guervós. Genetic algorithm for burst detection and activity tracking in event streams. In PPSN, pages 302--311, 2006. Google ScholarDigital Library
- L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 44--54, 2006. Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003. Google ScholarCross Ref
- J. J. Brown and P. H. Reingen. Social ties and word-of-mouth referral behavior. Journal of Consumer Research: An Interdisciplinary Quarterly, 14(3):350--362, 1987.Google ScholarCross Ref
- C. Chatfield. The analysis of time series. In Chapman and Hall, 1984.Google Scholar
- L. Chen and A. Roy. Event detection from flickr data through wavelet-based spatial analysis. In CIKM, pages 523--532, 2009. Google ScholarDigital Library
- K. W. Church and W. A. Gale. Poisson mixtures. Natural Language Engineering, 1:163--190, 1995.Google ScholarCross Ref
- T. L. Fond and J. Neville. Randomization tests for distinguishing social influence and homophily effects. In WWW, pages 601--610, 2010. Google ScholarDigital Library
- G. P. C. Fung, J. X. Yu, P. S. Yu, and H. Lu. Parameter free bursty events detection in text streams. In VLDB, pages 181--192, 2005. Google ScholarDigital Library
- D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. In WWW '04: Proceedings of the 13th international conference on World Wide Web, pages 491--501, 2004. Google ScholarDigital Library
- T. Hofmann. Probabilistic latent smantic analysis. In UAI, 1999. Google ScholarDigital Library
- A. T. Ihler, J. Hutchins, and P. Smyth. Adaptive event detection with time-varying poisson processes. In KDD, pages 207--216, 2006. Google ScholarDigital Library
- J. M. Kleinberg. Bursty and hierarchical structure in streams. In KDD, pages 91--101, 2002. Google ScholarDigital Library
- T. Lappas, B. Arai, M. Platakis, D. Kotsakos, and D. Gunopulos. On burstiness-aware search for document sequences. In KDD, pages 477--486, 2009. Google ScholarDigital Library
- J. Leskovec, L. A. Adamic, and B. A. Huberman. The dynamics of viral marketing. In EC '06: Proceedings of the 7th ACM conference on Electronic commerce, pages 228--237, 2006 Google ScholarDigital Library
- S. Z. Li. Markov random field modeling in image analysis. In Springer-Verlag New York, Inc., 2001. Google ScholarDigital Library
- G. McLachlan and T. Krishnan. The em algorithm and extensions. Wiley series in probability and statistics, Hoboken, NJ, 2008. Wiley.Google Scholar
- Q. Mei, D. Cai, D. Zhang, and C. Zhai. Topic modeling with network regularization. In WWW, pages 101--110, 2008. Google ScholarDigital Library
- Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In KDD, pages 198--207, 2005. Google ScholarDigital Library
- S. Morris. Contagion. In Review of Economic Studies, pages 57--78, 2000.Google Scholar
- R. Nickalls. A new approach to solving the cubic: Cardan's solution revealed. In The Mathematical Gazette, page 354--359, 1993.Google Scholar
- N. Parikh and N. Sundaresan. Scalable and near real-time burst detection from ecommerce queries. In KDD, pages 972--980, 2008. Google ScholarDigital Library
- D. Preston, P. Protopapas, and C. E. Brodley. Event discovery in time series. In SDM, pages 61--72, 2009.Google ScholarCross Ref
- Y. Sun, J. Han, J. Gao, and Y. Yu. itopicmodel: Information network-integrated topic modeling. In ICDM, pages 493--502, 2009. Google ScholarDigital Library
- X. Wang, C. Zhai, X. Hu, and R. Sproat. Mining correlated bursty topic patterns from coordinated text streams. In KDD, pages 784--793, 2007. Google ScholarDigital Library
- C. Zhai and J. D. Lafferty. Model-based feedback in the kl-divergence retrieval model. In CIKM, pages 403--410, 2001. Google ScholarDigital Library
- C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In KDD, pages 743--748, 2004. Google ScholarDigital Library
- Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, pages 1501--1506, 2007. Google ScholarDigital Library
- D. Zhou, X. Ji, H. Zha, and C. L. Giles. Topic evolution and social interactions: how authors effect research. In CIKM, pages 248--257, 2006. Google ScholarDigital Library
- X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, pages 912--919, 2003.Google ScholarDigital Library
- Y. Zhu and D. Shasha. Efficient elastic burst detection in data streams. In KDD, pages 336--345, 2003. Google ScholarDigital Library
Index Terms
- PET: a statistical model for popular events tracking in social communities
Recommendations
Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementAspect-based opinion mining is widely applied to review data to aggregate or summarize opinions of a product, and the current state-of-the-art is achieved with Latent Dirichlet Allocation (LDA)-based model. Although social media data like tweets are ...
Topic and sentiment aware microblog summarization for twitter
AbstractRecent advances in microblog content summarization has primarily viewed this task in the context of traditional multi-document summarization techniques where a microblog post or their collection form one document. While these techniques already ...
Is That Twitter Hashtag Worth Reading
WCI '15: Proceedings of the Third International Symposium on Women in Computing and InformaticsOnline social media such as Twitter, Facebook, Wikis and Linkedin have made a great impact on the way we consume information in our day to day life. Now it has become increasingly important that we come across appropriate content from the social media ...
Comments