ABSTRACT
The Internet is teeming with high variability phenomena, from measured IP flow sizes to aspects of inferred router-level connectivity, but there still exists considerable debate about how best to deal with this encountered high variability and model it. While one popular approach favors modeling highly variable event sizes with conventional, finite variance distributions such as lognormal or Weibull distributions, Mandelbrot has argued for the last 40 years that there are compelling mathematical, statistical, and practical reasons for why infinite variance distributions are natural candidates for capturing the essence behind high variability phenomena. In this paper, we elaborate on Mandelbrot's arguments and present a methodology that often allows for a clear distinction between the two approaches. In particular, by requiring the resulting models to be resilient to ambiguities (i.e., robust to real-world deficiencies in the underlying network measurements) and internally self-consistent (i.e., insensitive with respect the duration, location, or time of the data collection), we provide a rigorous framework for a qualitative assessment of the observed high variability. We apply the proposed framework to assess previously reported findings about measured Internet traffic and inferred router- and AS-level connectivity. In the process, we also discuss what our approach has to say about recent discussions concerning network traffic being Poisson or self-similar and router-level or AS-level connectivity graphs of the Internet being scale-free or not.
- R. Albert, H. Jeong, and A.-L. Barabási. Attack and error tolerance of complex networks, Nature 406, pp. 378--382, 2000.Google ScholarCross Ref
- D. Alderson. Technological and Economic Drivers and Constraints in the Internet's "Last Mile", Technical Report CIT-CDS-04-004, California Institute of Technology, 2004.Google Scholar
- A. Bookstein Informetric distributions, Part II: Resilience to ambiguity Journal of the Amer. Soc. for Information Science, Vol. 41, No. 5, pp. 376--386, 1990.Google ScholarCross Ref
- A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the Web, in: Proc. of 9th Intl. WWW Conference, pp. 309--320, 2000. Google ScholarDigital Library
- J. Cao, W. S. Cleveland, D. Lin, and D. X. Sun. Internet traffic tends toward Poisson and independent as the load increases. in: Nonlinear Estimation and Classification, C. Holmes, D. Denison, M. Hansen, B. Yu, and B. Mallick (Eds.), Springer-Verlag, New York, 2002.Google Scholar
- H. Chang, R. Govindan, S. Jamin, S. Shenker, and W. Willinger Towards Capturing Representative AS-Level Internet Topologies Proc. of ACM SIGMETRICS '02, 2002. Google ScholarDigital Library
- Q. Chen, H. Chang, R. Govindan, S. Jamin, S. Shenker, and W. Willinger. The Origin of power laws in Internet topologies revisited. Proc. IEEE INFOCOM '02, New York, NY, 2002.Google Scholar
- K. Claffy, H.-W. Braun, and G. Polyzos. A parameterizable methodology for Internet traffic flow profiling, IEEE Journal on Selected Areas in Communications 13, pp. 1481--1494, 1995. Google ScholarDigital Library
- Cooperative Association for Internet Data Analysis (CAIDA), Skitter. Available from www.caida.org/tools/measurement/skitter/.Google Scholar
- A. B. Downey. The structural cause for file size distribution. www.allendowney.com/research/filesize/cds-tr25 2000.ps.gz, 2000.Google Scholar
- A. B. Downey. Evidence for long-tailed distributions in the Internet. Proc. 1st ACM SIGCOMM Internet Measurement Workshop '01, San Francisco, CA, pp. 229--241, 2001. Google ScholarDigital Library
- A. B. Downey. Lognormal and Pareto distributions in the Internet. www.allendowney.com/research/longtail/downey03lognormal.pdf, 2003.Google Scholar
- D. Draper, J. S. Hodges, C. L. Mallows and D. Pregibon. Exchangeability and data analysis, J. R. Statist. Soc. A 156, pp. 9--37, 1993.Google ScholarCross Ref
- N. Duffield, C. Lund, and M. Thorup. Estimating flow distributions from sampled flow statistics, Proc. ACM SIGCOMM '03, Karlruhe, Germany, 2003. Google ScholarDigital Library
- M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the Internet topology. Proc. ACM SIGCOMM '99, Cambridge, MA, pp. 251--262, 1999. Google ScholarDigital Library
- A. Feldmann, Characteristics of TCP connection arrivals, in: Self-Similar Network Traffic and Performance Evaluation, K. Park and W. Willinger (Eds.), J. Wiley & Sons, New York, pp. 367--400, 2000.Google ScholarCross Ref
- W. Feller. An Introduction to Probability Theory and its Applications, Volume II, Second Edition. Wiley & Sons, New York, 1966.Google Scholar
- D.R. Figueiredo, A. Feldmann, B. Liu, V. Misra, D. Towsley, and W. Willinger. On TCP and self-similar traffic. Performance Evaluation, 2004 (to appear). Google ScholarDigital Library
- S. Floyd and V. Paxson. Difficulties in simulating the Internet. IEEE/ACM Trans. Networking, Vol. 9, pp. 392--403, 2001. Google ScholarDigital Library
- C. M. Goldie and C. Klüppelberg. Subexponential distributions. in: A Practical Guide to Heavy Tails, R. J. Adler, R. E. Feldman, and M. S. Taqqu (Eds.), Birkhäuser, Boston, 1998. Google ScholarDigital Library
- R. Govindan and H. Tangmunarunkit. Heuristics for Internet Map Discovery, Proceeding of IEEE INFOCOM '00, 2000.Google ScholarCross Ref
- N.L. Johnson, S. Kotz, and N. Balakrishnan. Continuous Univariate Distributions, Volume1, Second Edition J. Wiley & Sons, New York, 1994.Google Scholar
- T. Karagiannis, M. Faloutsos, M. Molle, and A. Broido. A nonstationary Poisson view of Internet traffic. Proc. IEEE INFOCOM '04, 2004.Google ScholarCross Ref
- T. G. Kurtz. Limit Theorems for Workload Input Models, in: Stochastic Networks: Theory and Applications, F.P. Kelly, S. Zachary, and I. Ziedins (Eds.), pp. 119--140, Oxford University Press, 1996.Google Scholar
- A. Lakhina, J. W. Byers, M. Crovella, and P. Xie. Sampling biases in IP topology measurements. Proc. IEEE INFOCOM '03, 2003.Google ScholarCross Ref
- L. Li, D. Alderson, W. Willinger, and J. Doyle. A first-principles approach to understanding the Internet's router-level topology. Proc. ACM SIGCOMM '04, Portland, Oregon, August 30 - September 2, 2004. Google ScholarDigital Library
- B. B. Mandelbrot. New methods in statistical economics, Journal of Political Economics 71, pp. 421--440, 1963.Google ScholarCross Ref
- B. B. Mandelbrot. Fractals and Scaling in Finance. Springer-Verlag, New York, 1997. Google ScholarDigital Library
- M. Mitzenmacher. A brief history of generative models for power law and lognormal distributions. Internet Mathematics, Vol. 1, No. 2, pp. 226--251, 2004.Google ScholarCross Ref
- P. Newman, T. Lyons, and G. Minshall. Flow labelled IP: A connectionless approach to ATM, Proc. IEEE INFOCOM '96, 1996. Google ScholarDigital Library
- J.-J. Pansiot and D. Grad. On routers and multicast trees in the Internet. ACM Computer Communication Review, 28(1):41-50, January 1998. Google ScholarDigital Library
- Passive Measurement and Analysis (PMA) Project, http://pma.nlanr.net/Google Scholar
- V. Paxson and S. Floyd, Wide-area traffic: The failure of Poisson modeling, IEEE/ACM Trans. on Networking 3, pp. 226--244, 1995. Google ScholarDigital Library
- S. I. Resnick. Heavy tail modeling and teletraffic data. Annals of Statistics 25, pp. 1805--1869, 1997.Google ScholarCross Ref
- Route Views, University of Oregon Route Views Project, Available at http://www.antc.uoregon.edu/route-views/.Google Scholar
- N. Spring, R. Mahajan, and D. Wetherall. Measuring ISP topologies with Rocketfuel. Proc. ACM SIGCOMM '02, Pittsburgh, PA, pp. 133--145, 2002. Google ScholarDigital Library
- G. Samorodnitsky and M. S. Taqqu. Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman and Hall, New York, 1994.Google Scholar
- J. W. Stewart III. BGP4: Inter-Domain ROuting in the Internet. Addison-Wesley, Boston, 1999. Google ScholarDigital Library
- M.S. Taqqu, W. Willinger, and R. Sherman. Proof of a fundamental result in self-similar traffic modeling, Computer Communication Review 27, pp. 5--23, 1997. Google ScholarDigital Library
- R. Teixeira, K. Marzullo, S. Savage, and G.M. Voelker. In Search of Path Diversity in ISP Networks, In Proc. IMC'03, Miami Beach, Florida. October 27-29, 2003. Google ScholarDigital Library
- J. W. Tukey. Data analysis and behavioral science or learning to bear the quantitative's man burden by shunning badmandments, in: The Collected Works of John W. Tukey, L. W. Jones (Ed.), Vol. III, Wadsworth, Monterey, CA, 1986.Google Scholar
- A. Veres and M. Boda. The chaotic nature of TCP congestion control. Proc. IEEE INFOCOM '00, 2000.Google ScholarCross Ref
- W. Willinger, R. Govindan, S. Jamin, V. Paxson and S. Shenker. Scaling phenomena in the Internet: Critically examining criticality. Proc. Nat. Acad. Sci. 99, suppl. 1, pp. 2573--2580, 2002.Google Scholar
- W. Willinger and V. Paxson, Where Mathematics meets the Internet, Notices of the AMS 45, pp. 961--970, 1998.Google Scholar
- S.-H. Yook, H. Jeong, and A.-L. Barabási. Modeling the Internet's large-scale topology. Proc. Nat. Acad. Sci. 99, pp. 13382--13386, 2002.Google ScholarCross Ref
- X. Zhu, J. Yu, and J. C. Doyle. Heavy tails, generalized coding, and optimal Web layout. Proc. IEEE INFOCOM '01, 2001.Google Scholar
Index Terms
- A pragmatic approach to dealing with high-variability in network measurements
Recommendations
Second-order heavy-tailed distributions and tail analysis
This correspondence studies the second-order distributions of heavy-tail distributed random variables (RVs). Two models for the heavy-tailed distributions are considered: power law and epsi-contaminated distributions. Special cases of the models ...
Random Walk with a Heavy-Tailed Jump Distribution
The classical random walk of which the one-step displacement variable u has a first finite negative moment is considered. The R.W. possesses an unique stationary distribution; x is a random variable with this distribution. It is assumed that the right-...
Reliability for extreme value distributions
In the area of stress-strength models there has been a large amount of work as regards estimation of the reliability R = Pr(X"2 < X"1) when X"1 and X"2 are independent random variables belonging to the same univariate family of distributions. The ...
Comments