ABSTRACT
Measuring reliability of edge networks in the Internet is difficult due to the size and heterogeneity of networks, the rarity of outages, and the difficulty of finding vantage points that can accurately capture such events at scale. In this paper, we use logs from a major CDN, detailing hourly request counts from address blocks. We discovered that in many edge address blocks, devices, collectively, contact the CDN every hour over weeks and months. We establish that a sudden temporary absence of these requests indicates a loss of Internet connectivity of those address blocks, events we call disruptions.
We develop a disruption detection technique and present broad and detailed statistics on 1.5M disruption events over the course of a year. Our approach reveals that disruptions do not necessarily reflect actual service outages, but can be the result of prefix migrations. Major natural disasters are clearly represented in our data as expected; however, a large share of detected disruptions correlate well with planned human intervention during scheduled maintenance intervals, and are thus unlikely to be caused by external factors. Cross-evaluating our results we find that current state-of-the-art active outage detection over-estimates the occurrence of disruptions in some address blocks. Our observations of disruptions, service outages, and different causes for such events yield implications for the design of outage detection systems, as well as for policymakers seeking to establish reporting requirements for Internet services.
- AT&T Switched Ethernet Service Guide. Section 3 - Service Level Agreement. http://cpr.att.com/pdf/se/0001--0003.pdf.Google Scholar
- Comcast Business: Enterprise Dedicated Internet PSA. https://business.comcast. com/terms-conditions-ent/enterprise_dedicated-internet-psa.Google Scholar
- FCC. 47 CFR Part 4 --DISRUPTIONS TO COMMUNICATIONS. Outage reporting requirements - threshold criteria. https://www.law.cornell.edu/cfr/text/47/part-4.Google Scholar
- Internet Addresses Survey dataset, PREDICT ID: USC-LANDER/internet-address-survey-reprobing-it76c-20170723/rev7956. Traces taken 2017-07-23 to 2017-08-06. Provided by the USC/LANDER project. http://www.isi.edu/ant/lander.Google Scholar
- Internet Addresses Survey dataset, PREDICT ID: USC-LANDER/internet-address-survey-reprobing-it76w-20170628/rev7942. Traces taken 2017-06-28 to 2017-07-13. Provided by the USC/LANDER project. http://www.isi.edu/ant/lander.Google Scholar
- Internet Addresses Survey dataset, PREDICT ID: USC-LANDER/internet-address-survey-reprobing-it77c-20170914/rev8018. Traces taken 2017-09-14 to 2017-09-29. Provided by the USC/LANDER project. http://www.isi.edu/ant/lander.Google Scholar
- Internet Addresses Survey dataset, PREDICT ID: USC-LANDER/internet-address-survey-reprobing-it77w-20170830/rev8013. Traces taken 2017-08-30 to 2017-09-14. Provided by the USC/LANDER project. http://www.isi.edu/ant/lander.Google Scholar
- Internet Outage Dataset, PREDICT ID: USC-LANDER/internet-outage-adaptive-a28all-20170403. Provided by the USC/LANDER project. http://www.isi.edu/ant/lander.Google Scholar
- Charu C. Aggarwal. Outlier Analysis, second edition. Springer Publishing Company, Incorporated, 2016. Google ScholarDigital Library
- O. Argon, A. Bremler-Barr, O. Mokryn, D. Schirman, Y. Shavitt, and U. Weinsberg. On the dynamics of IP address allocation and availability of end-hosts. arXiv preprint arXiv:1011.2324, 2010.Google Scholar
- R. Banerjee, A. Razaghpanah, L. Chiang, A. Mishra, V. Sekar, Y. Choi, and P. Gill. Internet Outages, the Eyewitness Accounts: Analysis of the Outages Mailing List. In PAM, 2015.Google ScholarCross Ref
- K. Benson, A. Dainotti, kc claffy, A. Snoeren, and M. Kallitsis. Leveraging Internet Background Radiation for Opportunistic Network Analysis. In ACM IMC, 2015. Google ScholarDigital Library
- R. Beverly and M. Luckie. The Impact of Router Outages on the AS-level Internet. In ACM SIGCOMM, Aug 2017. Google ScholarDigital Library
- R. Beverly, M. Luckie, L. Mosley, and k. claffy. Measuring and Characterizing IPv6 Router Availability. In Passive and Active Network Measurement Workshop (PAM), pages 123--135, Mar 2015.Google ScholarCross Ref
- Z. Bischof, F. Bustamante, and N. Feamster. The Growing Importance of Being Always On -- A First Look at the Reliability of Broadband Internet Access. In Research Conference on Communications, Information and Internet Policy (TPRC) 46, 2018.Google Scholar
- Z. Bischof, F. Bustamante, and R. Stanojevic. Need, Want, Can Afford: Broadband Markets and the Behavior of Users. In ACM IMC, 2014. Google ScholarDigital Library
- BroadbandNow. The Complete List of Internet Providers in the US. https://broadbandnow.com/All-Providers.Google Scholar
- R. Bush, O. Maennel, M. Roughan, and S. Uhlig. Internet Optometry: Assessing the Broken Glasses in Internet Reachability. In ACM IMC, 2009. Google ScholarDigital Library
- Comcast Business. Maintenance Notifications. https://business.comcast.com/terms-conditions-ent/maintenance.Google Scholar
- R. Cleveland, W. Cleveland, and I. Terpenning. Stl: A seasonal-trend decomposition procedure based on loess. Journal of Official Statistics, 6(1):3, 1990.Google Scholar
- G. Comarela, G. Gürsun, and M. Crovella. Studying interdomain routing over long timescales. In ACM IMC, 2013. Google ScholarDigital Library
- A. Dainotti, C. Squarcella, E. Aben, KC Claffy, M. Chiesa, M. Russo, and A. Pescape. Analysis of Country-wide Internet Outages Caused by Censorship. In ACM IMC, 2011. Google ScholarDigital Library
- A. Dhamdhere, R. Teixeira, C. Dovrolis, and C. Diot. NetDiagnoser: Troubleshooting Network Unreachabilities Using End-to-end Probes and Routing Data. In CoNEXT, 2007. Google ScholarDigital Library
- DSLReports.com. Is there an official DSL network maintenance window? http://www.dslreports.com/faq/2496.Google Scholar
- Z. Durumeric, E. Wustrow, and J. A. Halderman. ZMap: Fast Internet-Wide Scanning and its Security Applications. In USENIX Security Symposium, 2013. Google ScholarDigital Library
- V. Giotsas, C. Dietzel, G. Smaragdakis, A. Feldmann, A. Berger, and E. Aben. Detecting Peering Infrastructure Outages in the Wild. In ACM SIGCOMM, 2017. Google ScholarDigital Library
- S. Grover, M. Park, S. Sundaresan, S. Burnett, H. Kim, B. Ravi, and N. Feamster. Peeking behind the NAT: an empirical study of home networks. In ACM IMC, 2013. Google ScholarDigital Library
- J. Heidemann, Y. Pradkin, R. Govindan, C. Papadopoulos, G. Bartlett, and J. Bannister. Census and survey of the visible internet. In ACM IMC, 2008. Google ScholarDigital Library
- J. Heidemann, Y. Pradkin, and A. Nisar. Back out: End-to-end inference of common points-of-failure in the internet (extended). Technical Report ISI-TR-724, USC/Information Sciences Institute, Feb 2018.Google Scholar
- C. Hublet and R. De Schrijver. DHCP reconfigure extension. IETF RFC 3203.Google Scholar
- V. Jandhyala, S. Fotopoulos, I. MacNeill, and P. Liu. Inference for single and multiple change-points in time series. Journal of Time Series Analysis, 34(4):423--446, 2013.Google ScholarCross Ref
- U. Javed, I. Cunha, D. R. Choffnes, E. Katz-Bassett, T. Anderson, and A. Krishnamurthy. PoiRoot: Investigating the Root Cause of Interdomain Path Changes. In ACM SIGCOMM, 2013. Google ScholarDigital Library
- E. Katz-Bassett, H. V. Madhyastha, J. P. John, A. Krishnamurthy, D. Wetherall, and T. Anderson. Studying Black Holes in the Internet with Hubble. In NSDI, 2008. Google ScholarDigital Library
- E. Katz-Bassett, C. Scott, D. R. Choffnes, I. Cunha, V. Valancius, N. Feamster, H. V. Madhyastha, T. Anderson, and A. Krishnamurthy. LIFEGUARD: Practical Repair of Persistent Route Failures. In ACM SIGCOMM, 2012. Google ScholarDigital Library
- C. Labovitz, A. Ahuja, A. Bose, and F. Jahanian. Delayed Internet Routing Convergence. In ACM SIGCOMM, 2000. Google ScholarDigital Library
- Miami Herald. No internet after Irma means no work and no fun. When will I be online again? http://www.miamiherald.com/news/weather/hurricane/article173954151.html.Google Scholar
- Al Jazeera News. Rising Internet shutdowns aimed at 'Silencing Dissent'. https://tinyurl.com/y8pb6eq9.Google Scholar
- Broadband in the U.K.: data and research. https://www.ofcom.org.uk/research-and-data/telecoms-research/broadband-research.Google Scholar
- Broadband Measurement Project, Canada. https://crtc.gc.ca/eng/internet/proj.htm.Google Scholar
- Measuring Broadband America. https://www.fcc.gov/general/measuring-broadband-america.Google Scholar
- Measuring Broadband Australia. https://www.accc.gov.au/consumers/internet-phone/monitoring-broadband-performance.Google Scholar
- R. Padmanabhan, A. Dhamdhere, E. Aben, kc claffy, and N. Spring. Reasons Dynamic Addresses Change. In ACM IMC, 2016. Google ScholarDigital Library
- V. Paxson. End-to-End Routing Behavior in the Internet. IEEE/ACM Transactions on Networking, 5(5):601--615, 1997. Google ScholarDigital Library
- D. Plonka and A. Berger. Temporal and Spatial Classification of Active IPv6 Addresses. In ACM IMC, 2015. Google ScholarDigital Library
- D. Plonka and A. Berger. kIP: a Measured Approach to IPv6 Address Anonymization. CoRR, abs/1707.03900, 2017.Google Scholar
- L. Quan, J. Heidemann, and Y. Pradkin. Trinocular: Understanding Internet Reliability Through Adaptive Probing. In ACM SIGCOMM, 2013. Google ScholarDigital Library
- P. Richter, M. Allman, R. Bush, and V. Paxson. A Primer on IPv4 Scarcity. ACM CCR, 45(2), Apr 2015. Google ScholarDigital Library
- P. Richter, G. Smaragdakis, D. Plonka, and A. Berger. Beyond Counting: New Perspectives on the Active IPv4 Address Space. In ACM IMC, 2016. Google ScholarDigital Library
- P. Richter, F. Wohlfart, N. Vallina-Rodriguez, M. Allman, R. Bush, A. Feldmann, C. Kreibich, N. Weaver, and V. Paxson. A Multi-perspective Analysis of Carrier-Grade NAT Deployment. In ACM IMC, 2016. Google ScholarDigital Library
- RIPE NCC. Atlas. http://atlas.ripe.net.Google Scholar
- John P. Rula, Fabián E. Bustamante, and Moritz Steiner. Cell Spotting: Studying the Role of Cellular Networks in the Internet. In ACM IMC, 2017. Google ScholarDigital Library
- SamKnows. Test methodology white paper, 2011.Google Scholar
- M A. Sánchez, J. S. Otto, Z. S. Bischof, D. R. Choffnes, F. E. Bustamante, B. Krishnamurthy, and W. Willinger. Dasu: Pushing Experiments to the Internet's Edge. In NSDI, 2013. Google ScholarDigital Library
- A. Schulman and N. Spring. Pingin' in the Rain. In ACM IMC, 2011. Google ScholarDigital Library
- A. Shah, R. Fontugne, E. Aben, C. Pelsser, and R. Bush. Disco: Fast, good, and cheap outage detection. In TMA, 2017.Google ScholarCross Ref
- Y. Shavitt and E. Shir. DIMES: Let the Internet Measure Itself. SIGCOMM Comput. Commun. Rev., 35, October 2005. Google ScholarDigital Library
- D. A. Stephens. Bayesian retrospective multiple-changepoint identification. Journal of the Royal Statistical Society. Series C (Applied Statistics), 43(1):159--178, 1994.Google Scholar
- S. Sundaresan, S. Burnett, N. Feamster, and W. Donato. BISmark: A testbed for deploying measurements and applications in broadband access networks. In USENIX ATC, 2014. Google ScholarDigital Library
- D. Turner, K. Levchenko, A. C. Snoeren, and S. Savage. California Fault Lines: Understanding the Causes and Impact of Network Failures. In ACM SIGCOMM, 2010. Google ScholarDigital Library
- O. Vallis, J. Hochenbaum, and A. Kejariwal. A Novel Technique for Long-Term Anomaly Detection in the Cloud. In Usenix HoutCloud, 2014. Google ScholarDigital Library
Index Terms
- Advancing the Art of Internet Edge Outage Detection
Recommendations
The Impact of Router Outages on the AS-level Internet
SIGCOMM '17: Proceedings of the Conference of the ACM Special Interest Group on Data CommunicationWe propose and evaluate a new metric for understanding the dependence of the AS-level Internet on individual routers. Whereas prior work uses large volumes of reachability probes to infer outages, we design an efficient active probing technique that ...
Trinocular: understanding internet reliability through adaptive probing
Natural and human factors cause Internet outages---from big events like Hurricane Sandy in 2012 and the Egyptian Internet shutdown in Jan. 2011 to small outages every day that go unpublicized. We describe Trinocular, an outage detection system that uses ...
Destination Unreachable: Characterizing Internet Outages and Shutdowns
ACM SIGCOMM '23: Proceedings of the ACM SIGCOMM 2023 ConferenceIn this paper, we provide the first comprehensive longitudinal analysis of government-ordered Internet shutdowns and spontaneous outages (i.e., disruptions not ordered by the government). We describe the available tools, data sources and methods to ...
Comments