skip to main content
research-article

Universal and Distinct Properties of Communication Dynamics: How to Generate Realistic Inter-event Times

Authors Info & Claims
Published:01 April 2015Publication History
Skip Abstract Section

Abstract

With the advancement of information systems, means of communications are becoming cheaper, faster, and more available. Today, millions of people carrying smartphones or tablets are able to communicate practically any time and anywhere they want. They can access their e-mails, comment on weblogs, watch and post videos and photos (as well as comment on them), and make phone calls or text messages almost ubiquitously. Given this scenario, in this article, we tackle a fundamental aspect of this new era of communication: How the time intervals between communication events behave for different technologies and means of communications. Are there universal patterns for the Inter-Event Time Distribution (IED)? How do inter-event times behave differently among particular technologies? To answer these questions, we analyzed eight different datasets from real and modern communication data and found four well-defined patterns seen in all the eight datasets. Moreover, we propose the use of the Self-Feeding Process (SFP) to generate inter-event times between communications. The SFP is an extremely parsimonious point process that requires at most two parameters and is able to generate inter-event times with all the universal properties we observed in the data. We also show three potential applications of the SFP: as a framework to generate a synthetic dataset containing realistic communication events of any one of the analyzed means of communications, as a technique to detect anomalies, and as a building block for more specific models that aim to encompass the particularities seen in each of the analyzed systems.

References

  1. Leman Akoglu, Pedro O. S. Vaz de Melo, and Christos Faloutsos. 2012. Quantifying reciprocity in large weighted communication networks. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A.-L. Barabási. 2005. The origin of bursts and heavy tails in human dynamics. Nature 435 (16 May 2005), 207--211. DOI:http://dx.doi.org/10.1038/nature03459Google ScholarGoogle Scholar
  3. Steve Bennett. 1983. Log-logistic regression models for survival data. Journal of the Royal Statistical Society. Series C (Applied Statistics) 32, 2 (1983), 165--171.Google ScholarGoogle ScholarCross RefCross Ref
  4. G. E. P. Box, G. M. Jenkins, and G. C. Reinsel. 1994. Time Series Analysis, Forecasting, and Control (third ed.). Prentice-Hall, Englewood Cliffs, New Jersey. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Flavio Chierichetti, Ravi Kumar, Prabhakar Raghavan, and Tamas Sarlos. 2012. Are web users really Markovian? In Proceedings of the 21st international conference on World Wide Web (WWW’12). ACM, New York, NY, 609--618. DOI:http://dx.doi.org/10.1145/2187836.2187919 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Aaron Clauset, Cosma R. Shalizi, and M. E. J. Newman. 2009. Power-law distributions in empirical data. SIAM Review 51, 4 (2 Feb 2009), 661+. DOI:http://dx.doi.org/10.1137/070710111 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. William S. Cleveland. 1979. Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association 74, 368 (1979), 829--836. DOI:http://dx.doi.org/10.2307/2286407Google ScholarGoogle ScholarCross RefCross Ref
  8. D. R. Cox and V. Isham. 1980. Point Processes. Taylor & Francis. http://books.google.com.br/books??id= KWF2xY6s3PoCGoogle ScholarGoogle Scholar
  9. D. R. Cox. 1955. Some statistical methods connected with series of events. Journal of the Royal Statistical Society. Series B (Methodological) 17, 2 (1955), 129--164. DOI:http://dx.doi.org/10.2307/2983950Google ScholarGoogle ScholarCross RefCross Ref
  10. Munmun De Choudhury, Hari Sundaram, Ajita John, and Dorée Duncan Seligmann. 2009. Social synchrony: Predicting mimicry of user actions in online social media. In Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 04. IEEE Computer Society, Washington, DC, 151--158. DOI:http://dx.doi.org/10.1109/CSE.2009.439 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Z. Dezsö, E. Almaas, A. Lukács, B. Rácz, I. Szakadát, and A.-L. Barabási. 2006. Dynamics of information access on the web. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics) 73, 6 (2006), 066132. DOI:http://dx.doi.org/10.1103/PhysRevE.73.066132Google ScholarGoogle ScholarCross RefCross Ref
  12. Jean-Pierre Eckmann, Elisha Moses, and Danilo Sergi. 2004. Entropy of dialogues creates coherent structures in e-mail traffic. Proceedings of the National Academy of Sciences of the United States of America 101, 40 (5 October 2004), 14333--14337. DOI:http://dx.doi.org/10.1073/pnas.0405728101Google ScholarGoogle ScholarCross RefCross Ref
  13. Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. 1999. On power-law relationships of the Internet topology. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM’99). ACM, New York, NY, 251--262. DOI:http://dx.doi.org/10.1145/316188.316229 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Peter R. Fisk. 1961. The graduation of income distributions. Econometrica 29, 2 (1961), 171--185.Google ScholarGoogle ScholarCross RefCross Ref
  15. Swapna S. Gokhale and Kishor S. Trivedi. 1998. Log-logistic software reliability growth model. In Proceedings of the 3rd IEEE International Symposium on High-Assurance Systems Engineering (HASE’98). IEEE Computer Society, Washington, DC, 34--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Frank A. Haight. 1967. Handbook of the Poisson Distribution {by} Frank A. Haight. Wiley New York.Google ScholarGoogle Scholar
  17. Uli Harder and Maya Paczuski. 2006. Correlated dynamics in human printing behaviour. Physica A 361, 1 (2006), 329--336.Google ScholarGoogle ScholarCross RefCross Ref
  18. Cesar A. Hidalgo. 2006. Scaling in the inter-event time of random and seasonal systems. Physica A 369 (2006), 877. http://www.citebase.org/abstract?id=oai:arXiv.org:cond-mat/0512278Google ScholarGoogle ScholarCross RefCross Ref
  19. Richard A. Johnson and Dean W. Wichern. 2007. Applied Multivariate Statistical Analysis (6 ed.). Pearson.Google ScholarGoogle Scholar
  20. T. Karagiannis, M. Molle, M. Faloutsos, and A. Broido. 2004. A nonstationary Poisson view of Internet traffic. In INFOCOM 2004. Proceedings of the 23rd Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM’04). Vol. 3, 1558--1569. DOI:http://dx.doi.org/10.1109/INFCOM.2004.1354569Google ScholarGoogle Scholar
  21. Márton Karsai, Kimmo Kaski, Albert-László Barabási, and János Kertész. 2012. Universal features of correlated bursty behaviour. Scientific Reports 2 (4 May 2012). DOI:http://dx.doi.org/10.1038/srep00397Google ScholarGoogle Scholar
  22. Jon Kleinberg. 2002. Bursty and hierarchical structure in streams. In Proceedings of the 8th ACM SIGKDD (KDD’02). ACM, New York, NY, 91--101. DOI:http://dx.doi.org/10.1145/775047.775061 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Bryan Klimt and Yiming Yang. 2004. Introducing the enron corpus. In Proceedings of the 1st Conference on Email and Anti-Spam (CEAS’04).Google ScholarGoogle Scholar
  24. A. Kuczura. 1973. The interrupted Poisson process as an overflow process. The Bell System Technical Journal 52 (1973), 437--448.Google ScholarGoogle ScholarCross RefCross Ref
  25. J. F. Lawless and Jerald F. Lawless. 1982. Statistical Models and Methods for Lifetime Data (Wiley Series in Probability & Mathematical Statistics). John Wiley & Sons.Google ScholarGoogle Scholar
  26. M. O. Lorenz. 1905. Methods of measuring the concentration of wealth. Publications of the American Statistical Association 9 (1905), 209--219.Google ScholarGoogle ScholarCross RefCross Ref
  27. Talat Mahmood. 2000. Survival of newly founded businesses: A log-logistic model approach. Journal of Small Business Economics 14, 3 (2000), 223--237.Google ScholarGoogle ScholarCross RefCross Ref
  28. R. Dean Malmgren, Jake M. Hofman, Luis A. N. Amaral, and Duncan J. Watts. 2009a. Characterizing individual communication patterns. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09). ACM, New York, NY, 607--616. DOI:http://dx.doi.org/10.1145/1557019.1557088 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Dean Malmgren, Daniel B. Stouffer, Andriana S. L. O. Campanharo, and Luis A. Nunes Amaral. 2009b. On universality in human correspondence activity. Science 325 (2009), 1696. doi:10.1126/science.1174562Google ScholarGoogle ScholarCross RefCross Ref
  30. R. Dean Malmgren, Daniel B. Stouffer, Adilson E. Motter, and Luís A. N. Amaral. 2008. A Poissonian explanation for heavy tails in e-mail communication. Proceedings of the National Academy of Sciences 105, 47 (25 November 2008), 18153--18158. DOI:http://dx.doi.org/10.1073/pnas.0800332105Google ScholarGoogle Scholar
  31. C. D. Sinclair, M. I. Ahmad, and A. Werritty. 1988. Log-logistic flood frequency analysis. Journal of Hydrology 98 (1988), 205--224.Google ScholarGoogle ScholarCross RefCross Ref
  32. Marcin Owczarczuk. 2011. Long memory in patterns of mobile phone usage. Physica A: Statistical Mechanics and Its Applications (Oct. 2011). DOI:http://dx.doi.org/10.1016/j.physa.2011.10.005Google ScholarGoogle Scholar
  33. Pedro O. S. Vaz de Melo, Leman Akoglu, Christos Faloutsos, and Antonio Alfredo Ferreira Loureiro. 2010. Surprising patterns for the call duration distribution of mobile phone users. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD). 354--369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Pedro O. S. Vaz de Melo, Christos Faloutsos, Renato Assuncao, and Antonio A. F. Loureiro. 2013. The self-feeding process: A unifying model for communication dynamics in the web. In Proceedings of the 22nd International World Wide Web Conference (WWW’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Pedro Olmo Stancioli Vaz de Melo, C. Faloutsos, and A. A. Loureiro. 2011. Human dynamics in large communication networks. In Proceedings of the SIAM Conference on Data Mining (SDM). SIAM/Omnipress, 968--879.Google ScholarGoogle Scholar
  36. Alexei Vazquez, Joao Gama Oliveira, Zoltan Dezso, Kwang-Il Goh, Imre Kondor, and Albert-Lazlo Barabasi. 2006. Modeling bursts and heavy tails in human dynamics. Physical Review E Statistics Nonlinear and Soft Matter Physics 73 (2006), 036127.Google ScholarGoogle ScholarCross RefCross Ref
  37. Hong Wei, Han Xiao-Pu, Zhou Tao, and Wang Bing-Hong. 2009. Heavy-tailed statistics in short-message communication. Chinese Physics Letters 26, 2 (2009), 028902.Google ScholarGoogle ScholarCross RefCross Ref
  38. H. O. A. Wold. 1948. On Stationary Point Processes and Markov Chains. Swedish and Danish Actuarial Societies.Google ScholarGoogle Scholar

Index Terms

  1. Universal and Distinct Properties of Communication Dynamics: How to Generate Realistic Inter-event Times

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 9, Issue 3
      TKDD Special Issue (SIGKDD'13)
      April 2015
      313 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/2737800
      Issue’s Table of Contents

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 April 2015
      • Accepted: 1 November 2014
      • Revised: 1 October 2014
      • Received: 1 March 2014
      Published in tkdd Volume 9, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader