Article

A pragmatic approach to dealing with high-variability in network measurements

Authors:
Walter Willinger

AT&T Labs Research

AT&T Labs Research
View Profile

,
David Alderson

California Institute of Technology

California Institute of Technology
View Profile

,
Lun Li

California Institute of Technology

California Institute of Technology
View Profile

IMC '04: Proceedings of the 4th ACM SIGCOMM conference on Internet measurementOctober 2004Pages 88–100https://doi.org/10.1145/1028788.1028800

Published:25 October 2004Publication History

IMC '04: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement

Pages 88–100

ABSTRACT

The Internet is teeming with high variability phenomena, from measured IP flow sizes to aspects of inferred router-level connectivity, but there still exists considerable debate about how best to deal with this encountered high variability and model it. While one popular approach favors modeling highly variable event sizes with conventional, finite variance distributions such as lognormal or Weibull distributions, Mandelbrot has argued for the last 40 years that there are compelling mathematical, statistical, and practical reasons for why infinite variance distributions are natural candidates for capturing the essence behind high variability phenomena. In this paper, we elaborate on Mandelbrot's arguments and present a methodology that often allows for a clear distinction between the two approaches. In particular, by requiring the resulting models to be resilient to ambiguities (i.e., robust to real-world deficiencies in the underlying network measurements) and internally self-consistent (i.e., insensitive with respect the duration, location, or time of the data collection), we provide a rigorous framework for a qualitative assessment of the observed high variability. We apply the proposed framework to assess previously reported findings about measured Internet traffic and inferred router- and AS-level connectivity. In the process, we also discuss what our approach has to say about recent discussions concerning network traffic being Poisson or self-similar and router-level or AS-level connectivity graphs of the Internet being scale-free or not.

References

R. Albert, H. Jeong, and A.-L. Barabási. Attack and error tolerance of complex networks, Nature 406, pp. 378--382, 2000.Google ScholarCross Ref
D. Alderson. Technological and Economic Drivers and Constraints in the Internet's "Last Mile", Technical Report CIT-CDS-04-004, California Institute of Technology, 2004.Google Scholar
A. Bookstein Informetric distributions, Part II: Resilience to ambiguity Journal of the Amer. Soc. for Information Science, Vol. 41, No. 5, pp. 376--386, 1990.Google ScholarCross Ref
A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the Web, in: Proc. of 9th Intl. WWW Conference, pp. 309--320, 2000. Google ScholarDigital Library
J. Cao, W. S. Cleveland, D. Lin, and D. X. Sun. Internet traffic tends toward Poisson and independent as the load increases. in: Nonlinear Estimation and Classification, C. Holmes, D. Denison, M. Hansen, B. Yu, and B. Mallick (Eds.), Springer-Verlag, New York, 2002.Google Scholar
H. Chang, R. Govindan, S. Jamin, S. Shenker, and W. Willinger Towards Capturing Representative AS-Level Internet Topologies Proc. of ACM SIGMETRICS '02, 2002. Google ScholarDigital Library
Q. Chen, H. Chang, R. Govindan, S. Jamin, S. Shenker, and W. Willinger. The Origin of power laws in Internet topologies revisited. Proc. IEEE INFOCOM '02, New York, NY, 2002.Google Scholar
K. Claffy, H.-W. Braun, and G. Polyzos. A parameterizable methodology for Internet traffic flow profiling, IEEE Journal on Selected Areas in Communications 13, pp. 1481--1494, 1995. Google ScholarDigital Library
Cooperative Association for Internet Data Analysis (CAIDA), Skitter. Available from www.caida.org/tools/measurement/skitter/.Google Scholar
A. B. Downey. The structural cause for file size distribution. www.allendowney.com/research/filesize/cds-tr25 2000.ps.gz, 2000.Google Scholar
A. B. Downey. Evidence for long-tailed distributions in the Internet. Proc. 1st ACM SIGCOMM Internet Measurement Workshop '01, San Francisco, CA, pp. 229--241, 2001. Google ScholarDigital Library
A. B. Downey. Lognormal and Pareto distributions in the Internet. www.allendowney.com/research/longtail/downey03lognormal.pdf, 2003.Google Scholar
D. Draper, J. S. Hodges, C. L. Mallows and D. Pregibon. Exchangeability and data analysis, J. R. Statist. Soc. A 156, pp. 9--37, 1993.Google ScholarCross Ref
N. Duffield, C. Lund, and M. Thorup. Estimating flow distributions from sampled flow statistics, Proc. ACM SIGCOMM '03, Karlruhe, Germany, 2003. Google ScholarDigital Library
M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the Internet topology. Proc. ACM SIGCOMM '99, Cambridge, MA, pp. 251--262, 1999. Google ScholarDigital Library
A. Feldmann, Characteristics of TCP connection arrivals, in: Self-Similar Network Traffic and Performance Evaluation, K. Park and W. Willinger (Eds.), J. Wiley & Sons, New York, pp. 367--400, 2000.Google ScholarCross Ref
W. Feller. An Introduction to Probability Theory and its Applications, Volume II, Second Edition. Wiley & Sons, New York, 1966.Google Scholar
D.R. Figueiredo, A. Feldmann, B. Liu, V. Misra, D. Towsley, and W. Willinger. On TCP and self-similar traffic. Performance Evaluation, 2004 (to appear). Google ScholarDigital Library
S. Floyd and V. Paxson. Difficulties in simulating the Internet. IEEE/ACM Trans. Networking, Vol. 9, pp. 392--403, 2001. Google ScholarDigital Library
C. M. Goldie and C. Klüppelberg. Subexponential distributions. in: A Practical Guide to Heavy Tails, R. J. Adler, R. E. Feldman, and M. S. Taqqu (Eds.), Birkhäuser, Boston, 1998. Google ScholarDigital Library
R. Govindan and H. Tangmunarunkit. Heuristics for Internet Map Discovery, Proceeding of IEEE INFOCOM '00, 2000.Google ScholarCross Ref
N.L. Johnson, S. Kotz, and N. Balakrishnan. Continuous Univariate Distributions, Volume1, Second Edition J. Wiley & Sons, New York, 1994.Google Scholar
T. Karagiannis, M. Faloutsos, M. Molle, and A. Broido. A nonstationary Poisson view of Internet traffic. Proc. IEEE INFOCOM '04, 2004.Google ScholarCross Ref
T. G. Kurtz. Limit Theorems for Workload Input Models, in: Stochastic Networks: Theory and Applications, F.P. Kelly, S. Zachary, and I. Ziedins (Eds.), pp. 119--140, Oxford University Press, 1996.Google Scholar
A. Lakhina, J. W. Byers, M. Crovella, and P. Xie. Sampling biases in IP topology measurements. Proc. IEEE INFOCOM '03, 2003.Google ScholarCross Ref
L. Li, D. Alderson, W. Willinger, and J. Doyle. A first-principles approach to understanding the Internet's router-level topology. Proc. ACM SIGCOMM '04, Portland, Oregon, August 30 - September 2, 2004. Google ScholarDigital Library
B. B. Mandelbrot. New methods in statistical economics, Journal of Political Economics 71, pp. 421--440, 1963.Google ScholarCross Ref
B. B. Mandelbrot. Fractals and Scaling in Finance. Springer-Verlag, New York, 1997. Google ScholarDigital Library
M. Mitzenmacher. A brief history of generative models for power law and lognormal distributions. Internet Mathematics, Vol. 1, No. 2, pp. 226--251, 2004.Google ScholarCross Ref
P. Newman, T. Lyons, and G. Minshall. Flow labelled IP: A connectionless approach to ATM, Proc. IEEE INFOCOM '96, 1996. Google ScholarDigital Library
J.-J. Pansiot and D. Grad. On routers and multicast trees in the Internet. ACM Computer Communication Review, 28(1):41-50, January 1998. Google ScholarDigital Library
Passive Measurement and Analysis (PMA) Project, http://pma.nlanr.net/Google Scholar
V. Paxson and S. Floyd, Wide-area traffic: The failure of Poisson modeling, IEEE/ACM Trans. on Networking 3, pp. 226--244, 1995. Google ScholarDigital Library
S. I. Resnick. Heavy tail modeling and teletraffic data. Annals of Statistics 25, pp. 1805--1869, 1997.Google ScholarCross Ref
Route Views, University of Oregon Route Views Project, Available at http://www.antc.uoregon.edu/route-views/.Google Scholar
N. Spring, R. Mahajan, and D. Wetherall. Measuring ISP topologies with Rocketfuel. Proc. ACM SIGCOMM '02, Pittsburgh, PA, pp. 133--145, 2002. Google ScholarDigital Library
G. Samorodnitsky and M. S. Taqqu. Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman and Hall, New York, 1994.Google Scholar
J. W. Stewart III. BGP4: Inter-Domain ROuting in the Internet. Addison-Wesley, Boston, 1999. Google ScholarDigital Library
M.S. Taqqu, W. Willinger, and R. Sherman. Proof of a fundamental result in self-similar traffic modeling, Computer Communication Review 27, pp. 5--23, 1997. Google ScholarDigital Library
R. Teixeira, K. Marzullo, S. Savage, and G.M. Voelker. In Search of Path Diversity in ISP Networks, In Proc. IMC'03, Miami Beach, Florida. October 27-29, 2003. Google ScholarDigital Library
J. W. Tukey. Data analysis and behavioral science or learning to bear the quantitative's man burden by shunning badmandments, in: The Collected Works of John W. Tukey, L. W. Jones (Ed.), Vol. III, Wadsworth, Monterey, CA, 1986.Google Scholar
A. Veres and M. Boda. The chaotic nature of TCP congestion control. Proc. IEEE INFOCOM '00, 2000.Google ScholarCross Ref
W. Willinger, R. Govindan, S. Jamin, V. Paxson and S. Shenker. Scaling phenomena in the Internet: Critically examining criticality. Proc. Nat. Acad. Sci. 99, suppl. 1, pp. 2573--2580, 2002.Google Scholar
W. Willinger and V. Paxson, Where Mathematics meets the Internet, Notices of the AMS 45, pp. 961--970, 1998.Google Scholar
S.-H. Yook, H. Jeong, and A.-L. Barabási. Modeling the Internet's large-scale topology. Proc. Nat. Acad. Sci. 99, pp. 13382--13386, 2002.Google ScholarCross Ref
X. Zhu, J. Yu, and J. C. Doyle. Heavy tails, generalized coding, and optimal Web layout. Proc. IEEE INFOCOM '01, 2001.Google Scholar

Index Terms

A pragmatic approach to dealing with high-variability in network measurements
1. Mathematics of computing
  1. Probability and statistics
    1. Distribution functions

Recommendations

Second-order heavy-tailed distributions and tail analysis

This correspondence studies the second-order distributions of heavy-tail distributed random variables (RVs). Two models for the heavy-tailed distributions are considered: power law and epsi-contaminated distributions. Special cases of the models ...
Read More
Random Walk with a Heavy-Tailed Jump Distribution

The classical random walk of which the one-step displacement variable u has a first finite negative moment is considered. The R.W. possesses an unique stationary distribution; x is a random variable with this distribution. It is assumed that the right-...
Read More
Reliability for extreme value distributions

In the area of stress-strength models there has been a large amount of work as regards estimation of the reliability R = Pr(X"2 < X"1) when X"1 and X"2 are independent random variables belonging to the same univariate family of distributions. The ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IMC '04: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
October 2004
386 pages
ISBN:1581138210
DOI:10.1145/1028788
General Chair:
Alfio Lombardo
U. Catania
,
Program Chair:
Jim Kurose
University of Massachusetts
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 October 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Pareto distributions
borrowing strength
heavy-tailed distributions
high variability
lognormal distributions
model consistency
model robustness
power-laws
scaling distributions
sequential moment plots
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate277of1,083submissions,26%
Upcoming Conference
IMC '24

Sponsor:

sigcomm

sigcomm

ACM Internet Measurement Conference

November 4 - 6, 2024

Madrid , AA , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 37
  Total Citations
  View Citations
- 694
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A pragmatic approach to dealing with high-variability in network measurements

IMC '04: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement

ABSTRACT

References

Cited By

Index Terms

Recommendations

Second-order heavy-tailed distributions and tail analysis

Random Walk with a Heavy-Tailed Jump Distribution

Reliability for extreme value distributions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A pragmatic approach to dealing with high-variability in network measurements

IMC '04: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement

ABSTRACT

References

Cited By

Index Terms

Recommendations

Second-order heavy-tailed distributions and tail analysis

Random Walk with a Heavy-Tailed Jump Distribution

Reliability for extreme value distributions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media