ABSTRACT
How often do individuals perform a given communication activity in the Web, such as posting comments on blogs or news? Could we have a generative model to create communication events with realistic inter-event time distributions (IEDs)? Which properties should we strive to match? Current literature has seemingly contradictory results for IED: some studies claim good fits with power laws; others with non-homogeneous Poisson processes. Given these two approaches, we ask: which is the correct one? Can we reconcile them all? We show here that, surprisingly, both approaches are correct, being corner cases of the proposed Self-Feeding Process (SFP). We show that the SFP (a) exhibits a unifying power, which generates power law tails (including the so-called "top-concavity" that real data exhibits), as well as short-term Poisson behavior; (b) avoids the "i.i.d. fallacy", which none of the prevailing models have studied before; and (c) is extremely parsimonious, requiring usually only one, and in general, at most two parameters. Experiments conducted on eight large, diverse real datasets (e.g., Youtube and blog comments, e-mails, SMSs, etc) reveal that the SFP mimics their properties very well.
- L. Akoglu, P. O. S. Vaz de Melo, and C. Faloutsos. Quantifying reciprocity in large weighted communication networks. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2012, Kuala Lumpur, 2012. Google ScholarDigital Library
- A. Barabási. The origin of bursts and heavy tails in human dynamics. Nature, 435:207--211, May 2005.Google ScholarCross Ref
- S. Bennett. Log-logistic regression models for survival data. Journal of the Royal Statistical Society. Series C (Applied Statistics), 32(2):165--171, 1983.Google Scholar
- G. E. P. Box, G. M. Jenkins, and G. C. Reinsel. Time Series Analysis, Forecasting, and Control. Prentice-Hall, Englewood Cliffs, New Jersey, third edition, 1994. Google ScholarDigital Library
- J. Cao, W. S. Cleveland, D. Lin, and D. X. Sun. Internet traffic tends to poisson and independent as the load increases. Technical report, Bell Labs Technical Report, 2001.Google Scholar
- F. Chierichetti, R. Kumar, P. Raghavan, and T. Sarlos. Are web users really markovian? In Proceedings of the 21st international conference on World Wide Web, WWW '12, pages 609--618, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Review, 51(4):661+, Feb 2009. Google ScholarDigital Library
- W. S. Cleveland. Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of the American Statistical Association, 74(368):829--836, 1979.Google ScholarCross Ref
- D. Cox and V. Isham. Point Processes. Monographs on Applied Probability and Statistics. Taylor & Francis, 1980.Google Scholar
- D. R. Cox. Some Statistical Methods Connected with Series of Events. Journal of the Royal Statistical Society. Series B (Methodological), 17(2):129--164, 1955.Google ScholarCross Ref
- M. De Choudhury, H. Sundaram, A. John, and D. D. Seligmann. Social synchrony: Predicting mimicry of user actions in online social media. In Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 04, pages 151--158, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarDigital Library
- J.-P. Eckmann, E. Moses, and D. Sergi. Entropy of dialogues creates coherent structures in e-mail traffic. Proceedings of the National Academy of Sciences of the United States of America, 101(40):14333--14337, October 2004.Google ScholarCross Ref
- M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In SIGCOMM '99: Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, pages 251--262, New York, NY, USA, 1999. ACM. Google ScholarDigital Library
- P. R. Fisk. The graduation of income distributions. Econometrica, 29(2):171--185, 1961.Google ScholarCross Ref
- S. Garriss, M. Kaminsky, M. J. Freedman, B. Karp, D. Mazières, and H. Yu. Re: Reliable email. In Proceedings of the Third USENIX/ACM Symposium on Networked System Design and Implementation (NSDI'06), pages 297--310, 2006. Google ScholarDigital Library
- S. S. Gokhale and K. S. Trivedi. Log-logistic software reliability growth model. In HASE '98: The 3rd IEEE International Symposium on High-Assurance Systems Engineering, pages 34--41, Washington, DC, USA, 1998. IEEE Computer Society. Google ScholarDigital Library
- F. A. Haight. Handbook of the Poisson distribution {by} Frank A. Haight. Wiley New York,, 1967.Google Scholar
- C. A. Hidalgo. Scaling in the inter-event time of random and seasonal systems. PHYSICA A, 369:877, 2006.Google ScholarCross Ref
- M. Jamali, G. Haffari, and M. Ester. Modeling the temporal dynamics of social rating networks using bidirectional effects of social relations and rating patterns. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 527--536, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- H. Jiang and C. Dovrolis. Why is the internet traffic bursty in short time scales? In Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'05), pages 241--252, 2005. Google ScholarDigital Library
- T. Karagiannis, M. Molle, M. Faloutsos, and A. Broido. A nonstationary Poisson view of Internet traffic. In INFOCOM 2004. Twenty-third AnnualJoint Conference of the IEEE Computer and Communications Societies, volume 3, pages 1558--1569 vol.3, 2004.Google ScholarCross Ref
- M. Karsai, K. Kaski, A.-L. Barabási, and J. Kertész. Universal features of correlated bursty behaviour. Scientific Reports, 2, May 2012.Google ScholarCross Ref
- J. Kleinberg. Bursty and hierarchical structure in streams. In Proceedings of the eighth ACM SIGKDD, KDD '02, pages 91--101, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
- B. Klimt and Y. Yang. Introducing the enron corpus. In CEAS'04: The First Conference on Email and Anti-Spam, 2004.Google Scholar
- A. Kuczura. The interrupted poisson process as an overflow process. The Bell System Technical Journal, 52:437--448, 1973.Google ScholarCross Ref
- J. F. Lawless and J. F. Lawless. Statistical Models and Methods for Lifetime Data (Wiley Series in Probability & Mathematical Statistics). John Wiley & Sons, January 1982.Google Scholar
- K. Lerman and T. Hogg. Using a model of social dynamics to predict popularity of news. In Proceedings of the 19th international conference on World wide web, WWW '10, pages 621--630, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- M. O. Lorenz. Methods of measuring the concentration of wealth. Publications of the American Statistical Association, 9:209--219, 1905.Google ScholarCross Ref
- T. Mahmood. Survival of newly founded businesses: A log-logistic model approach. JournalSmall Business Economics, 14(3):223--237, 2000.Google Scholar
- R. D. Malmgren, J. M. Hofman, L. A. Amaral, and D. J. Watts. Characterizing individual communication patterns. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '09, pages 607--616, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- R. D. Malmgren, D. B. Stouffer, A. S. L. O. Campanharo, and L. A. N. Amaral. On universality in human correspondence activity. SCIENCE, 325:1696, 2009.Google ScholarCross Ref
- R. D. Malmgren, D. B. Stouffer, A. E. Motter, and L. A. N. Amaral. A poissonian explanation for heavy tails in e-mail communication. Proceedings of the National Academy of Sciences, 105(47):18153--18158, November 2008.Google ScholarCross Ref
- C. S. M.I. Ahmad and A. Werritty. Log-logistic flood frequency analysis. Journal of Hydrology, 98:205--224, 1988.Google ScholarCross Ref
- J. G. Oliveira and A.-L. Barabasi. Human dynamics: Darwin and Einstein correspondence patterns. Nature, 437(7063):1251, Oct. 2005.Google ScholarCross Ref
- M. Owczarczuk. Long memory in patterns of mobile phone usage. Physica A: Statistical Mechanics and its Applications, Oct. 2011.Google Scholar
- K. Radinsky, K. Svore, S. Dumais, J. Teevan, A. Bocharov, and E. Horvitz. Modeling and predicting behavioral dynamics on the web. In Proceedings of the 21st international conference on World Wide Web, WWW '12, pages 599--608, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- E. Shmueli, A. Kagian, Y. Koren, and R. Lempel. Care to comment?: recommendations for commenting on news stories. In Proceedings of the 21st international conference on World Wide Web, WWW '12, pages 429--438, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- P. O. S. Vaz de Melo, L. Akoglu, C. Faloutsos, and A. A. F. Loureiro. Surprising patterns for the call duration distribution of mobile phone users. In The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), pages 354--369, 2010. Google ScholarDigital Library
- P. O. S. Vaz de Melo, C. Faloutsos, and A. A. Loureiro. Human dynamics in large communication networks. In SIAM Conference on Data Mining (SDM), pages 968--879. SIAM / Omnipress, 2011.Google Scholar
- A. Vazquez, J. G. Oliveira, Z. Dezso, K.-I. Goh, I. Kondor, and A.-L. Barabasi. Modeling bursts and heavy tails in human dynamics. Phys Rev E Stat Nonlin Soft Matter Phys, 73:036127, 2006.Google ScholarCross Ref
- H. Wold and U. universitet. Statistiska institutionen. On Stationary Point Processes and Markov Chains. Selected publications - University of Uppsala, Department of Statistics. Swedish and Danish Actuarial Societies, 1948.Google Scholar
Index Terms
- The self-feeding process: a unifying model for communication dynamics in the web
Recommendations
Universal and Distinct Properties of Communication Dynamics: How to Generate Realistic Inter-event Times
TKDD Special Issue (SIGKDD'13)With the advancement of information systems, means of communications are becoming cheaper, faster, and more available. Today, millions of people carrying smartphones or tablets are able to communicate practically any time and anywhere they want. They can ...
Burstiness Scale: A Parsimonious Model for Characterizing Random Series of Events
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningThe problem to accurately and parsimoniously characterize random series of events (RSEs) seen in the Web, such as Yelp reviews or Twitter hashtags, is not trivial. Reports found in the literature reveal two apparent conflicting visions of how RSEs should ...
Modeling Temporal Activity to Detect Anomalous Behavior in Social Media
Special Issue on KDD 2016 and Regular PapersSocial media has become a popular and important tool for human communication. However, due to this popularity, spam and the distribution of malicious content by computer-controlled users, known as bots, has become a widespread problem. At the same time, ...
Comments