ABSTRACT
Network structure and content in microblogging sites like Twitter influence each other ---user A on Twitter follows user B for the tweets that B posts on the network, and A may then re-tweet the content shared by B to his/her own followers. In this paper, we propose a probabilistic model to jointly model link communities and content topics by leveraging both the social graph and the content shared by users. We model a community as a distribution over users, use it as a source for topics of interest, and jointly infer both communities and topics using Gibbs sampling. While modeling communities using the social graph, or modeling topics using content have received a great deal of attention, a few recent approaches try to model topics in content-sharing platforms using both content and social graph. Our work differs from the existing generative models in that we explicitly model the social graph of users along with the user-generated content, mimicking how the two entities co-evolve in content-sharing platforms. Recent studies have found Twitter to be more of a content-sharing network and less a social network, and it seems hard to detect tightly knit communities from the follower-followee links. Still, the question of whether we can extract Twitter communities using both links and content is open. In this paper, we answer this question in the affirmative. Our model discovers coherent communities and topics, as evinced by qualitative results on sub-graphs of Twitter users. Furthermore, we evaluate our model on the task of predicting follower-followee links. We show that joint modeling of links and content significantly improves link prediction performance on a sub-graph of Twitter (consisting of about 0.7 million users and over 27 million tweets), compared to generative models based on only structure or only content and paths-based methods such as Katz.
- C. J. Anderson, S. Wasserman, and K. Faust. Building stochastic blockmodels. Social Networks, 1992.Google Scholar
- B. Ball, B. Karrer, and M. Newman. An efficient and principled method for detecting communities in networks. CoRR, 2011.Google ScholarCross Ref
- D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. JMLR, 2003. Google ScholarDigital Library
- W. L. Buntine. Operations for learning with graphical models. JAIR'94. Google ScholarDigital Library
- D. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In NIPS, 2000.Google Scholar
- L. Dietz, S. Bickel, and T. Scheffer. Unsupervised prediction of citation influences. In ICML, 2007. Google ScholarDigital Library
- E. Eroshev, S. Fienberg, and J. Lafferty. Mixed-membership models of scientific publications. PNAS, 2004.Google ScholarCross Ref
- S. Fortunato. Community detection in graphs. Physics Reports, 2010.Google ScholarCross Ref
- S. Geman and D. Geman. Stochastic relaxation, gibbs distributions, and bayesian restoration of images. PAMI, 1984. Google ScholarDigital Library
- M. Girvan and M. Newman. Community structure in social and biological networks. In PNAS, 2002.Google ScholarCross Ref
- T. Griffiths and M. Steyvers. Finding scientific topics. In PNAS, 2004.Google ScholarCross Ref
- B. Hu, Z. Song, and M. Ester. User features and social networks for topic modeling in online social media. In ASONAM, 2012, pages 202--209. IEEE, 2012. Google ScholarDigital Library
- B. Karrer and M. Newman. Stochastic blockmodels and community structure in networks. Phys. Rev. E, 2011.Google ScholarCross Ref
- H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In WWW, 2010. Google ScholarDigital Library
- J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and Z. Ghahramani. Kronecker graphs: An approach to modeling networks. JMLR'10. Google ScholarDigital Library
- J. Leskovec and C. Faloutsos. Sampling from large graphs. In KDD'06. Google ScholarDigital Library
- D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. JASIST, 2007. Google ScholarDigital Library
- Y. Liu, A. Niculescu-Mizil, and W. Gryc. Topic-link lda: Joint models of topic and author community. In ICML, 2009. Google ScholarDigital Library
- Z. Lu, B. Savas, W. Tang, and I. Dhillon. Supervised link prediction using multiple sources. In ICDM, 2010. Google ScholarDigital Library
- A. McCallum, A. Corrada-Emmanuel, and X. Wang. Topic and role discovery in social networks. In IJCAI, 2005. Google ScholarDigital Library
- A. McCallum, X. Wang, and A. Corrada-Emmanuel. Topic and role discovery in social networks with experiments on enron and academic email. JAIR, 2007. Google ScholarDigital Library
- T. P. Minka. Estimating a dirichlet distribution. Technical report, Microsoft Research, 2003.Google Scholar
- R. Nallapati and W. Cohen. Link-plsa-lda: A new unsupervised model for topics and influence of blogs. In ICWSM, 2008.Google Scholar
- R. M. Nallapati, A. Ahmed, E. P. Xing, and W. Cohen. Joint latent topic models for text and citations. In KDD, 2008. Google ScholarDigital Library
- M. Newman. Detecting community structure in networks. The European Physical Journal B, 2004.Google ScholarCross Ref
- M. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review, 2004.Google ScholarCross Ref
- N. Pathak, C. Delong, A. Banerjee, and K. Erickson. Social Topic Models for Community Extraction. In SNA-KDD, 2008.Google Scholar
- I. Porteous, D. Newman, A. Ihler, A. Asuncion, P. Smyth, and M. Welling. Fast collapsed gibbs sampling for LDA. KDD'08. Google ScholarDigital Library
- M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In UAI, 2004. Google ScholarDigital Library
- M. Sachan, D. Contractor, T. Faruquie, and L. V. Subramaniam. Using content and interactions for discovering communities in social networks. In WWW, 2012. Google ScholarDigital Library
- Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical dirichlet processes. Journal of American Statistical Association, 2005.Google Scholar
- J. Yang and J. Leskovec. Patterns of temporal variation in online media. In WSDM, 2011. Google ScholarDigital Library
- W. Zachary. An information flow model for conflict and fission in small groups. Journal of anthropological research, 1977.Google Scholar
- D. Zhou, E. Manavoglu, J. Li, C. L. Giles, and H. Zha. Probabilistic models for discovering e-communities. WWW'06. Google ScholarDigital Library
Index Terms
- Community detection in content-sharing social networks
Recommendations
Content-based emotion classification in online social networks for Chinese Microblogs
ACSW '17: Proceedings of the Australasian Computer Science Week MulticonferenceRecent years, social networks are popular throughout the whole world. In China in particular, more people spend their time on social networks. Sina Weibo, as the most popular microblogs in China, records millions of microblogs from different population. ...
Sampling Content from Online Social Networks: Comparing Random vs. Expert Sampling of the Twitter Stream
Analysis of content streams gathered from social networking sites such as Twitter has several applications ranging from content search and recommendation, news detection to business analytics. However, processing large amounts of data generated on these ...
Identifying the influential bloggers in a community
WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data MiningBlogging becomes a popular way for a Web user to publish information on the Web. Bloggers write blog posts, share their likes and dislikes, voice their opinions, provide suggestions, report news, and form groups in Blogosphere. Bloggers form their ...
Comments