skip to main content
10.1145/3098822.3098856acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free Access

SWIFT: Predictive Fast Reroute

Published:07 August 2017Publication History

ABSTRACT

Network operators often face the problem of remote outages in transit networks leading to significant (sometimes on the order of minutes) downtimes. The issue is that BGP, the Internet routing protocol, often converges slowly upon such outages, as large bursts of messages have to be processed and propagated router by router.

In this paper, we present SWIFT, a fast-reroute framework which enables routers to restore connectivity in few seconds upon remote outages. SWIFT is based on two novel techniques. First, SWIFT deals with slow outage notification by predicting the overall extent of a remote failure out of few control-plane (BGP) messages. The key insight is that significant inference speed can be gained at the price of some accuracy. Second, SWIFT introduces a new data-plane encoding scheme, which enables quick and flexible update of the affected forwarding entries. SWIFT is deployable on existing devices, without modifying BGP.

We present a complete implementation of SWIFT and demonstrate that it is both fast and accurate. In our experiments with real BGP traces, SWIFT predicts the extent of a remote outage in few seconds with an accuracy of ~90% and can restore connectivity for 99% of the affected destinations.

Skip Supplemental Material Section

Supplemental Material

swiftpredictivefastreroute.webm

webm

86.4 MB

References

  1. TCP Behavior of BGP. (2012). https://archive.psg.com/121009.nag-bgp-tcp.pdf.Google ScholarGoogle Scholar
  2. 5-minute outage costs Google $545,000 in revenue. (2013). http://venturebeat.com/2013/08/16/3-minute-outage-costs-google-545000-in-revenue/.Google ScholarGoogle Scholar
  3. Cisco Systems. BGP PIC Edge and Core. (2015). http://www.cisco.com/c/en/us/td/docs/routers/7600/ios/15S/configuration/guide/7600_15_0s_book/BGP.html.Google ScholarGoogle Scholar
  4. Amazon.com went down for about 20 minutes, and the world freaked out. (2016). http://mashable.com/2016/03/10/amazon-is-down-2/.Google ScholarGoogle Scholar
  5. CIDR report. (2016). http://www.cidr-report.org/as2.0/.Google ScholarGoogle Scholar
  6. Cisco Umbrella 1 Million. (2016). https://blog.opendns.com/2016/12/14/cisco-umbrella-1-million/.Google ScholarGoogle Scholar
  7. ExaBGP. (2016). https://github.com/Exa-Networks/exabgp.Google ScholarGoogle Scholar
  8. Google cloud outage highlights more than just networking failure. (2016). http://bit.ly/1MFO2Ye.Google ScholarGoogle Scholar
  9. RIPE RIS Raw Data. (2016). https://www.ripe.net/data-tools/stats/ris/.Google ScholarGoogle Scholar
  10. Rodrigo Aldecoa, Chiara Orsini, and Dmitri Krioukov. 2015. Hyperbolic graph generator. Computer Physics Communications (2015).Google ScholarGoogle Scholar
  11. A. Atlas and A. Zinin. Basic Specification for IP Fast Reroute: Loop-Free Alternates. RFC 5286. (Sept. 2008).Google ScholarGoogle Scholar
  12. Ritwik Banerjee, Abbas Razaghpanah, Luis Chiang, Akassh Mishra, Vyas Sekar, Yejin Choi, and Phillipa Gill. 2015. Internet Outages, the Eyewitness Accounts: Analysis of the Outages Mailing List.Google ScholarGoogle Scholar
  13. Zied Ben Houidi, Mickael Meulle, and Renata Teixeira. Understanding slow BGP routing table transfers. In ACM IMC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Anat Bremler-Barr, Edith Cohen, Haim Kaplan, and Yishay Mansour. 2002. Predicting and Bypassing End-to-end Internet Service Degradations. In ACM SIGCOMM Workshop on Internet Measurment (IMW '02). ACM, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Matthew Caesar, Lakshminarayanan Subramanian, and Randy H Katz. 2003. Towards localizing root causes of BGP dynamics. University of California Berkeley.Google ScholarGoogle Scholar
  16. CAIDA. The CAIDA AS Relationships Dataset. (2016). http://www.caida.org/data/active/as-relationships/Google ScholarGoogle Scholar
  17. Jaideep Chandrashekar, Zhenhai Duan, Zhi-Li Zhang, and Jeff Krasky. Limiting path exploration in BGP. In IEEE INFOCOM, 2005.Google ScholarGoogle Scholar
  18. Di-Fa Chang, Ramesh Govindan, and John Heidemann. The Temporal and Topological Characteristics of BGP Path Changes. In ICNP 2003.Google ScholarGoogle Scholar
  19. Ítalo Cunha, Renata Teixeira, Darryl Veitch, and Christophe Diot. 2014. DTRACK: a system to predict and track internet path changes. IEEE/ACM TON (2014).Google ScholarGoogle Scholar
  20. G. Das, D. Papadimitriou, B. Puype, D. Colle, M. Pickavet, and P. Demeester. SRLG identification from time series analysis of link state data. In COMSNETS, 2011. Google ScholarGoogle ScholarCross RefCross Ref
  21. Benoit Donnet and Olivier Bonaventure. 2001. On BGP communities. ACM SIGCOMM CCR (2001).Google ScholarGoogle Scholar
  22. Nick Feamster, David G. Andersen, Hari Balakrishnan, and M. Frans Kaashoek. Measuring the Effects of Internet Path Faults on Reactive Routing. In ACM SIGMETRICS, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Anja Feldmann, Olaf Maennel, Z Morley Mao, Arthur Berger, and Bruce Maggs. 2004. Locating Internet routing instabilities. ACM SIGCOMM CCR (2004).Google ScholarGoogle Scholar
  24. Clarence Filsfils. BGP Convergence in much less than a second. (2007). Presentation NANOG 23.Google ScholarGoogle Scholar
  25. Clarence Filsfils, Pradosh Mohapatra, John Bettink, Pranav Dharwadkar, Peter De Vriendt, Yuri Tsier, Virginie Van Den Schrieck, Olivier Bonaventure, and Pierre Francois. 2011. BGP Prefix Independent Convergence. Technical Report. Cisco.Google ScholarGoogle Scholar
  26. Pierre Francois, Pierre-Alain Coste, Bruno Decraene, and Olivier Bonaventure. 2007. Avoiding disruptions during maintenance operations on BGP sessions. IEEE Transactions on Network and Service Management (2007).Google ScholarGoogle Scholar
  27. Lixin Gao. 2001. On inferring autonomous system relationships in the Internet. IEEE/ACM TON (2001).Google ScholarGoogle Scholar
  28. Phillipa Gill, Navendu Jain, and Nachiappan Nagappan. Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications. In ACM SIGCOMM 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Timothy G Griffin and Brian J Premore. An experimental analysis of BGP convergence time. In IEEE ICNP, 2011.Google ScholarGoogle Scholar
  30. Arpit Gupta, Robert MacDavid, Rüdiger Birkner, Marco Canini, Nick Feamster, Jennifer Rexford, and Laurent Vanbever. 2016. An industrial-scale software defined internet exchange point. In USENIX NSDI 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Arpit Gupta, Laurent Vanbever, Muhammad Shahbaz, Sean Donovan, Brandon Schlinker, Nick Feamster, Jennifer Rexford, Scott Shenker, Russ Clark, and Ethan Katz-Bassett. SDX: A Software Defined Internet eXchange. In SIGCOMM 2014.Google ScholarGoogle Scholar
  32. Nikola Gvozdiev, Brad Karp, Mark Handley, and others. LOUP: The Principles and Practice of Intra-Domain Route Dissemination. In USENIX NSDI 2013.Google ScholarGoogle Scholar
  33. Thomas Holterbach, Stefano Vissicchio, Alberto Dainotti, and Laurent Vanbever. 2017. SWIFT: Predictive Fast Reroute. Tech. Report (2017). https://swift.ethz.chGoogle ScholarGoogle Scholar
  34. Polly Huang, Anja Feldmann, and Walter Willinger. A non-instrusive, wavelet-based approach to detecting network performance problems. In ACM SIGCOMM Workshop on Internet Measurement, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Gianluca Iannaccone, Chen-nee Chuah, Richard Mortier, Supratik Bhattacharyya, and Christophe Diot. Analysis of link failures in an IP backbone. In ACM SIGCOMM Workshop on Internet measurement, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Umar Javed, Italo Cunha, David Choffnes, Ethan Katz-Bassett, Thomas Anderson, and Arvind Krishnamurthy. PoiRoot: Investigating the Root Cause of Interdomain Path Changes. In ACM SIGCOMM, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. John P John, Ethan Katz-Bassett, Arvind Krishnamurthy, Thomas Anderson, and Arun Venkataramani. Consensus routing: The Internet as a distributed system. In USENIX, 2008.Google ScholarGoogle Scholar
  38. D. Katz and D. Ward. Bidirectional Forwarding Detection. RFC 5880. (2010).Google ScholarGoogle Scholar
  39. Ethan Katz-Bassett, Colin Scott, David R Choffnes, Ítalo Cunha, Vytautas Valancius, Nick Feamster, Harsha V Madhyastha, Thomas Anderson, and Arvind Krishnamurthy. 2012. LIFEGUARD: practical repair of persistent route failures. ACM SIGCOMM CCR (2012).Google ScholarGoogle Scholar
  40. Ravish Khosla, Sonia Fahmy, Y. Charlie Hu, and Jennifer Neville. 2011. Prediction Models for Long-term Internet Prefix Availability. Computer Networks (2011).Google ScholarGoogle Scholar
  41. Ramana Rao Kompella, Jennifer Yates, Albert Greenberg, and Alex C Snoeren. IP fault localization via risk modeling. In NSDI, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Dmitri V. Krioukov, Fragkiskos Papadopoulos, Maksim Kitsak, Amin Vahdat, and Marián Boguñá. 2010. Hyperbolic Geometry of Complex Networks. CoRR abs/1006.5169 (2010). http://arxiv.org/abs/1006.5169Google ScholarGoogle Scholar
  43. Nate Kushman, Srikanth Kandula, and Dina Katabi. 2007. Can You Hear Me Now?!: It Must Be BGP. ACM SIGCOMM CCR (2007).Google ScholarGoogle Scholar
  44. Nate Kushman, Srikanth Kandula, Dina Katabi, and Bruce M Maggs. R-BGP: Staying connected in a connected world. In USENIX NSDI, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Craig Labovitz, Abha Ahuja, Abhijit Bose, and Farnam Jahanian. 2000. Delayed Internet routing convergence. ACM SIGCOMM CCR (2000).Google ScholarGoogle Scholar
  46. Olaf Maennel and Anja Feldmann. Realistic BGP Traffic for Test Labs. In ACM SIGCOMM, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Philipp Mao, Rudiger Birkner, Thomas Holterbach, and Laurent Vanbever. Boosting the BGP convergence in SDXes with SWIFT.In ACM SIGCOMM, 2017 (Demo).Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Z Morley Mao, Randy Bush, Timothy G Griffin, and Matthew Roughan. BGP beacons. In ACM IMC, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Z Morley Mao, Ramesh Govindan, George Varghese, and Randy H Katz. Route flap damping exacerbates Internet routing convergence. In SIGCOMM, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. W.B. Norton. 2011. The Internet Peering Playbook: Connecting to the Core of the Internet. DrPeering Press.Google ScholarGoogle Scholar
  51. University of Oregon. Route Views Project. (2016). www.routeviews.org/.Google ScholarGoogle Scholar
  52. P. Pan, G. Swallow, and A. Atlas. Fast Reroute Extensions to RSVP-TE for LSP Tunnels. RFC 4090. (May 2005).Google ScholarGoogle Scholar
  53. Vern Paxson. 2006. End-to-end Routing Behavior in the Internet. ACM SIGCOMM CCR (2006).Google ScholarGoogle Scholar
  54. Cristel Pelsser, Olaf Maennel, Pradosh Mohapatra, Randy Bush, and Keyur Patel. Route flap damping made usable. In PAM, 2011. Google ScholarGoogle ScholarCross RefCross Ref
  55. Ponemon Institute. Cost of Data Center Outages. (2016). http://datacenterfrontier.com/white-paper/cost-data-center-outages/.Google ScholarGoogle Scholar
  56. B. Quoitin and S. Uhlig. 2005. Modeling the Routing of an Autonomous System with C-BGP. IEEE Network Magazine of Global Internetworking (2005).Google ScholarGoogle Scholar
  57. Mark Reitblatt, Marco Canini, Arjun Guha, and Nate Foster. FatTire: Declarative Fault Tolerance for Software-defined Networks. In HotSDN, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Y. Rekhter, T. Li, and S. Hares. 2006. A Border Gateway Protocol 4. RFC 4271.Google ScholarGoogle Scholar
  59. Matthew Roughan, Tim Griffin, Morley Mao, Albert Greenberg, and Brian Freeman. Combining Routing and Traffic Data for Detection of IP Forwarding Anomalies. In SIGMETRICS, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. M. Roughan, W. Willinger, O. Maennel, D. Perouli, and R. Bush. 2011. 10 Lessons from 10 Years of Measuring and Modeling the Internet's Autonomous Systems. IEEE Journal on Selected Areas in Communications (2011).Google ScholarGoogle Scholar
  61. M. Shand and S. Bryant. IP Fast Reroute Framework. RFC 5714. (Jan. 2010).Google ScholarGoogle Scholar
  62. Ashwin Sridharan, Sue B. Moon, and Christophe Diot. On the Correlation Between Route Dynamics and Routing Loops. In ACM IMC, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Daniel Turner, Kirill Levchenko, Alex C. Snoeren, and Stefan Savage. California Fault Lines: Understanding the Causes and Impact of Network Failures. In ACM SIGCOMM, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Stefano Vissicchio, Olivier Tilmans, Laurent Vanbever, and Jennifer Rexford. Central control over distributed routing. In ACM SIGCOMM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Stefano Vissicchio, Laurent Vanbever, Cristel Pelsser, Luca Cittadini, Pierre Francois, and Olivier Bonaventure. 2013. Improving Network Agility with Seamless BGP Reconfigurations. IEEE/ACM TON (2013).Google ScholarGoogle Scholar
  66. Feng Wang, Zhuoqing Morley Mao, Jia Wang, Lixin Gao, and Randy Bush. A Measurement Study on the Impact of Routing Events on End-to-end Internet Path Performance. In ACM SIGCOMM, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Junling Wang and Srihari Nelakuditi. IP fast reroute with failure inferencing. In ACM SIGCOMM workshop on Internet network management, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Jian Wu, Zhuoqing Morley Mao, Jennifer Rexford, and Jia Wang. Finding a needle in a haystack: Pinpointing significant BGP routing changes in an IP network. In USENIX NSDI, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Ying Zhang, Z Morley Mao, and Jia Wang. A framework for measuring and predicting the impact of routing changes. In IEEE INFOCOM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Ying Zhang, Z. Morley Mao, and Ming Zhang. Effective Diagnosis of Routing Disruptions from End Systems. In USENIX NSDI, 2008.Google ScholarGoogle Scholar

Index Terms

  1. SWIFT: Predictive Fast Reroute

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGCOMM '17: Proceedings of the Conference of the ACM Special Interest Group on Data Communication
          August 2017
          515 pages
          ISBN:9781450346535
          DOI:10.1145/3098822

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 August 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate554of3,547submissions,16%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader