research-article

Public Access

Modeling Temporal Activity to Detect Anomalous Behavior in Social Media

Authors:
Alceu Ferraz Costa

University of São Paulo, Avenida Trabalhador Sãao-carlense, Centro

University of São Paulo, Avenida Trabalhador Sãao-carlense, Centro

0000-0003-1716-9577
View Profile

,
Yuto Yamaguchi

Tsukuba University, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan

Tsukuba University, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
View Profile

,
Agma Juci Machado Traina

University of São Paulo, Avenida Trabalhador Sãao-carlense, Centro

University of São Paulo, Avenida Trabalhador Sãao-carlense, Centro
View Profile

,
Caetano Traina Jr.

University of São Paulo, Avenida Trabalhador Sãao-carlense, Centro

University of São Paulo, Avenida Trabalhador Sãao-carlense, Centro
View Profile

,
Christos Faloutsos

Carnegie Mellon University, Forbes Avenue Pittsburgh, PA

Carnegie Mellon University, Forbes Avenue Pittsburgh, PA
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 11 Issue 4Article No.: 49pp 1–23https://doi.org/10.1145/3064884

Published:14 July 2017Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Social media has become a popular and important tool for human communication. However, due to this popularity, spam and the distribution of malicious content by computer-controlled users, known as bots, has become a widespread problem. At the same time, when users use social media, they generate valuable data that can be used to understand the patterns of human communication. In this article, we focus on the following important question: Can we identify and use patterns of human communication to decide whether a human or a bot controls a user? The first contribution of this article is showing that the distribution of inter-arrival times (IATs) between postings is characterized by following four patterns: (i) heavy-tails, (ii) periodic-spikes, (iii) correlation between consecutive values, and (iv) bimodallity. As our second contribution, we propose a mathematical model named Act-M (Activity Model). We show that Act-M can accurately fit the distribution of IATs from social media users. Finally, we use Act-M to develop a method that detects if users are bots based only on the timing of their postings. We validate Act-M using data from over 55 million postings from four social media services: Reddit, Twitter, Stack-Overflow, and Hacker-News. Our experiments show that Act-M provides a more accurate fit to the data than existing models for human dynamics. Additionally, when detecting bots, Act-M provided a precision higher than 93% and 77% with a sensitivity of 70% for the Twitter and Reddit datasets, respectively.

References

Albert-László Barabási. 2005. The origin of bursts and heavy tails in human dynamics. Nature 435, 7039 (2005), 207--211. Google ScholarCross Ref
Marco Bazzi, Francisco Blasques, S. J. Koopman, and André Lucas. 2014. Time varying transition probabilities for markov regime switching models. Tinbergen Institute Discussion Papers (2014).Google ScholarCross Ref
Adrian W. Bowman and Adelchi Azzalini. 2004. Applied Smoothing Techniques for Data Analysis: The Kernel Approach with S-Plus Illustrations. Oxford University Press. 1--196 pages.Google Scholar
Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Computing Surveys 43, 3 (2009), 1--72. Google ScholarDigital Library
Junghoo Cho and Hector Garcia-Molina. 2003. Estimating frequency of change. ACM Transactions on Internet Technology 3, 3 (2003), 256--290. Google ScholarDigital Library
Zi Chu, Steven Gianvecchio, Haining Wang, and Sushil Jajodia. 2010. Who is tweeting on twitter: Human, bot, or cyborg? In Annual Computer Security Applications Conference (ACSAC). ACM, New York, NY, 21--30. Google ScholarDigital Library
Alceu Ferraz Costa, Yuto Yamaguchi, Agma Juci Machado Traina, Caetano Traina Jr., and Christos Faloutsos. 2015. RSC: Mining and modeling temporal activity in social media. In International Conference on Knowledge Discovery and Data Mining. ACM, 269--278.Google Scholar
Chris Drummond and Robert C. Holte. 2006. Cost curves: An improved method for visualizing classifier performance. Machine Learning 65 (2006), 95--130. Google ScholarDigital Library
Jean-Pierre Eckmann, Elisha Moses, and Danilo Sergi. 2004. Entropy of dialogues creates coherent structures in e-mail traffic. Search Results Proceedings of the National Academy of Sciences (PNAS) 101, 7 (2004), 14333--14337.Google ScholarCross Ref
Stephan Günnemann, Nikou Günnemann, and Christos Faloutsos. 2014. Detecting anomalies in dynamic rating data: A robust probabilistic model for rating evolution. In International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, NY, 841--850. Google ScholarDigital Library
César A. Hidalgo R. 2006. Conditions for the emergence of scaling in the inter-event time of uncorrelated and seasonal systems. Physica A 369, 2 (sep 2006), 877--883. Google ScholarCross Ref
Paul Gerhard Hoel, Sidney C. Port, and Charles J. Stone. 1986. Introduction to Stochastic Processes. Waveland Pr. Inc. 203 pages.Google Scholar
Alexander Ihler, Jon Hutchins, and Padhraic Smyth. 2006. Adaptive event detection with time-varying poisson processes. In International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 207--216. Google ScholarDigital Library
Da-Cheng Juan, Lei Li, Huan-Kai Peng, Diana Marculescu, and Christos Faloutsos. 2014. Beyond poisson: Modeling inter-arrival time of requests in a datacenter. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Vol. 1. 198--209. Google ScholarCross Ref
Márton Karsai, Kimmo Kaski, Albert-László Barabási, and János Kertész. 2012. Universal features of correlated bursty behaviour. Scientific Reports 2 (2012), 1--7. Google ScholarCross Ref
Jon Kleinberg. 2003. Bursty and hierarchical structure in streams. In International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 373--397.Google Scholar
Narayanan C. Krishnan and Diane J. Cook. 2014. Activity recognition on streaming sensor data. Pervasive and Mobile Computing 10 (2014), 138--154. Google ScholarDigital Library
Theodoros Lappas, Marcos R. Vieira, Dimitrios Gunopulos, and Vassilis J. Tsotras. 2012. On the spatiotemporal burstiness of terms. In Very Large Data Bases (VLDB). VLDB Endowment, 836--847. Google ScholarDigital Library
Siyuan Liu, Lei Chen, and Lionel M Ni. 2014. Anomaly detection from incomplete data. Transactions on Knowledge Discovery from Data (TKDD) 9, 11 (2014), 11:1--11:22.Google Scholar
R. Dean Malmgren, Jake M. Hofman, Luis A. N. Amaral, and Duncan J. Watts. 2009a. Characterizing individual communication patterns. In International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 607--616. Google ScholarDigital Library
R. Dean Malmgren, D. B. Stouffer, A. S. L. O. Campanharo, and Luis A. N. Amaral. 2009b. On universality in human correspondence activity. Science 325 (2009), 1696--1700. Google ScholarCross Ref
R. Dean Malmgren, Daniel B. Stouffer, Adilson E. Motter, and Luís A. N. Amaral. 2008. A poissonian explanation for heavy tails in e-mail communication. Proceedings of the National Academy of Sciences (PNAS) 105, 47 (2008), 18153--18158. Google ScholarCross Ref
Donald W. Marquardt. 1963. An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial and Applied Mathematics 11, 2 (1963), 431--441. Google ScholarCross Ref
João Gama Oliveira and Albert-László Barabási. 2005. Human dynamics: Darwin and Einstein correspondence patterns. Nature 437 (2005), 1251. Google ScholarCross Ref
Raphael Ottoni, Diego Las Casas, João Paulo Pesce, Wagner Meira Jr., Christo Wilson, Alan Mislove, and Virgilio Aleida. 2014. Of pins and tweets: Investigating how users behave across image-and text-based social networks. AAAI Conference on Weblogs and Social Media. AAAI Press, Ann Arbor, MI, USA, 386--395.Google Scholar
Suhas Ranganath, Fred Morstatter, Xia Hu, Jiliang Tang, Suhang Wang, and Huan Liu. 2016. Predicting online protest participation of social media users. AAAI Conference on Artificial Intelligence. AAAI Press, Phoenix, AZ, USA, 208--214.Google Scholar
Erich Schubert, Arthur Zimek, and Hans Peter Kriegel. 2014. Local outlier detection reconsidered: A generalized view on locality with applications to spatial, video, and network outlier detection. Data Mining and Knowledge Discovery 28, 1 (2014), 190--237. DOI:http://dx.doi.org/10.1007/s10618-012-0300-z Google ScholarDigital Library
Ka Cheung Sia, Junghoo Cho, and Hyun-Kyu Cho. 2007. Efficient monitoring algorithm for fast news alerts. Transactions on Knowledge and Data Engineering (TKDE) 19, 7 (2007), 950--961. Google ScholarDigital Library
Mikalai Tsytsarau, Themis Palpanas, and Malu Castellanos. 2014. Dynamics of news events and social media reaction. In International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, NY, 901--910. Google ScholarDigital Library
Pedro O. S. Vaz de Melo, Christos Faloutsos, Renato Assunção, Rodrigo Alvez, and Antonio A. F. Loureiro. 2015. Universal and distinct properties of communication dynamics: How to generate realistic inter-event times. Transactions on Knowledge Discovery from Data (TKDD) 9, 3 (2015), 24:1--24:31.Google Scholar
Pedro O. S. Vaz de Melo, Christos Faloutsos, Renato Assunção, and Antonio A. F. Loureiro. 2013. The self-feeding process: A unifying model for communication dynamics in the web. In International Conference on World Wide Web. 1319--1330. Google ScholarDigital Library
Rose Yu, Xinran He, and Yan Liu. 2015. GLAD: Group anomaly detection in social media analysis. In Transactions on Knowledge Discovery from Data (TKDD), vol. 10. 18:1--18:21.Google ScholarDigital Library
Reza Zafarani and Huan Liu. 2015. 10 Bits of surprise: Detecting malicious users with minimum information. In International on Conference on Information and Knowledge Management (CIKM). 423--431.Google ScholarDigital Library
Chao Michael Zhang and Vern Paxson. 2011. Detecting and analyzing automated activity on twitter. International Conference on Passive and Active Measurement. Springer-Verlag, Atlanta, GA, USA, 102--111. Google ScholarCross Ref
Hengshu Zhu, Hui Xiong, Yong Ge, and Enhong Chen. 2015. Discovery of ranking fraud for mobile apps. Transactions on Knowledge and Data Engineering (TKDE) 27, 1 (2015), 74--87. Google ScholarCross Ref

Index Terms

Modeling Temporal Activity to Detect Anomalous Behavior in Social Media
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Believability and Harmfulness Shape the Virality of Misleading Social Media Posts
WWW '23: Proceedings of the ACM Web Conference 2023

Misinformation on social media presents a major threat to modern societies. While previous research has analyzed the virality across true and false social media posts, not every misleading post is necessarily equally viral. Rather, misinformation has ...
Read More
The diffusion of misinformation on social media

This study examines dynamic communication processes of political misinformation on social media focusing on three components: the temporal pattern, content mutation, and sources of misinformation. We traced the lifecycle of 17 popular political rumors ...
Read More
Social Media: An Exploratory Study of Information, Misinformation, Disinformation, and Malinformation
Abstract
The widespread use of social media all around the globe has affected the way of life in all aspects, not only for individuals but for businesses as well. Businesses share their upcoming events, reveal their products, and advertise to their ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 11, Issue 4
Special Issue on KDD 2016 and Regular Papers
November 2017
419 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3119906
Editor:
Jie Tang
Tsinghua University, China
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 July 2017
- Accepted: 1 March 2017
- Revised: 1 October 2016
- Received: 1 January 2016
Published in tkdd Volume 11, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Social media
anomaly detection
communication dynamics
inter-arrival times
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 1,046
  Total Downloads
- Downloads (Last 12 months)113
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Modeling Temporal Activity to Detect Anomalous Behavior in Social Media

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Believability and Harmfulness Shape the Virality of Misleading Social Media Posts

The diffusion of misinformation on social media

Social Media: An Exploratory Study of Information, Misinformation, Disinformation, and Malinformation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Modeling Temporal Activity to Detect Anomalous Behavior in Social Media

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Believability and Harmfulness Shape the Virality of Misleading Social Media Posts

The diffusion of misinformation on social media

Social Media: An Exploratory Study of Information, Misinformation, Disinformation, and Malinformation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media