skip to main content
10.1145/3230543.3230545acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free Access

B4 and after: managing hierarchy, partitioning, and asymmetry for availability and scale in google's software-defined WAN

Published:07 August 2018Publication History

ABSTRACT

Private WANs are increasingly important to the operation of enterprises, telecoms, and cloud providers. For example, B4, Google's private software-defined WAN, is larger and growing faster than our connectivity to the public Internet. In this paper, we present the five-year evolution of B4. We describe the techniques we employed to incrementally move from offering best-effort content-copy services to carrier-grade availability, while concurrently scaling B4 to accommodate 100x more traffic. Our key challenge is balancing the tension introduced by hierarchy required for scalability, the partitioning required for availability, and the capacity asymmetry inherent to the construction and operation of any large-scale network. We discuss our approach to managing this tension: i) we design a custom hierarchical network topology for both horizontal and vertical software scaling, ii) we manage inherent capacity asymmetry in hierarchical topologies using a novel traffic engineering algorithm without packet encapsulation, and iii) we re-architect switch forwarding rules via two-stage matching/hashing to deal with asymmetric network failures at scale.

References

  1. 2017. Viptela Inc. http://viptela.com/. (2017).Google ScholarGoogle Scholar
  2. 2018. VeloCloud Networks, Inc. http://www.velocloud.com/. (2018).Google ScholarGoogle Scholar
  3. IEEE Standard 802.1Q. 2011. IEEE standard for local and metropolitan area networks-media access control (MAC) bridges and virtual bridged local area networks. (2011).Google ScholarGoogle Scholar
  4. Ed. A. Bashandy, C. Filsfils, and P. Mohapatra. 2018. BGP Prefix Independent Convergence. IETF Internet Draft. (2018).Google ScholarGoogle Scholar
  5. Ajay Kumar Bangla, Alireza Ghaffarkhah, Ben Preskill, Bikash Koley, Christoph Albrecht, Emilie Danna, Joe Jiang, and Xiaoxue Zhao. 2015. Capacity Planning for the Google Backbone Network. In International Symposium on Mathematical Programming (ISMP'15).Google ScholarGoogle Scholar
  6. Sebastian Brandt, Klaus-Tycho Foerster, and Roger Wattenhofer. 2016. On Consistent Migration of Flows in SDNs. In INFOCOM'16.Google ScholarGoogle Scholar
  7. Deborah Brungard, Malcolm Betts, Satoshi Ueno, Ben Niven-Jenkins, and Nurit Sprecher. 2009. Requirements of an MPLS transport profile. RFC 5654. (2009).Google ScholarGoogle Scholar
  8. Martin Casado, Teemu Koponen, Scott Shenker, and Amin Tootoonchian. 2012. Fabric: A Retrospective on Evolving SDN. In HotSDN'12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2012. Spanner: Google's Globally-distributed Database. In OSDI'12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Klaus-Tycho Foerster, Ratul Mahajan, and Roger Wattenhofer. 2016. Consistent Updates in Software Defined Networks: On Dependencies, Loop Freedom, and Blackholes. In IFIP Networking'16.Google ScholarGoogle Scholar
  11. Pierre Francois and Olivier Bonaventure. 2005. Avoiding Transient Loops during IGP convergence in IP Networks. In INFOCOM'05.Google ScholarGoogle ScholarCross RefCross Ref
  12. Soudeh Ghorbani and Matthew Caesar. 2012. Walk the Line: Consistent Network Updates with Bandwidth Guarantees. In HotSDN'12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ramesh Govindan, Ina Minei, Mahesh Kallahalla, Bikash Koley, and Amin Vahdat. 2016. Evolve or Die: High-Availability Design Principles Drawn from Google's Network Infrastructure. In SIGCOMM'16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Victor Heorhiadi, Michael K. Reiter, and Vyas Sekar. 2016. Simplifying Software-defined Network Optimization Using SOL. In NSDI'16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Thomas Holterbach, Stefano Vissicchio, Alberto Dainotti, and Laurent Vanbever. 2017. SWIFT: Predictive Fast Reroute. In SIGCOMM'17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Chi-Yao Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, Mohan Nanduri, and Roger Wattenhofer. 2013. Achieving High Utilization with Software-driven WAN. In SIGCOMM'13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gianluca Iannaccone, Chen-nee Chuah, Richard Mortier, Supratik Bhattacharyya, and Christophe Diot. 2002. Analysis of Link Failures in an IP Backbone. In ACM SIGCOMM Workshop on Internet Measurment'02. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, Jon Zolla, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2013. B4: Experience with a Globally-deployed Software Defined WAN. In SIGCOMM'13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Virajith Jalaparti, Ivan Bliznets, Srikanth Kandula, Brendan Lucier, and Ishai Menache. 2016. Dynamic Pricing and Traffic Engineering for Timely Inter-Datacenter Transfers. In SIGCOMM'16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mikel Jimenez and Henry Kwok. 2017. Building Express Backbone: Facebook's New Long-haul Network. https://code.facebook.com/posts/1782709872057497/. (2017).Google ScholarGoogle Scholar
  21. Xin Jin, Hongqiang Harry Liu, Rohan Gandhi, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Jennifer Rexford, and Roger Wattenhofer. 2014. Dynamic scheduling of Network Updates. In SIGCOMM'14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Srikanth Kandula, Ishai Menache, Roy Schwartz, and Spandana Raj Babbula. 2014. Calendaring for Wide Area Networks. In SIGCOMM'14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Nanxi Kang, Zhenming Liu, Jennifer Rexford, and David Walker. 2013. Optimizing the "One Big Switch" Abstraction in Software-defined Networks. In CoNEXT'13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. Kleinrock and F. Kamoun. 1977. Hierarchical Routing for Large Networks, Performance Evaluation and Optimization. Computer Networks 1, 3 (January 1977), 155--174.Google ScholarGoogle Scholar
  25. Alok Kumar, Sushant Jain, Uday Naik, Anand Raghuraman, Nikhil Kasinadhuni, Enrique Cauich Zermeno, C. Stephen Gunn, Jing Ai, Björn Carlin, Mihai Amarandei-Stavila, Mathieu Robin, Aspi Siganporia, Stephen Stuart, and Amin Vahdat. 2015. BwE: Flexible, Hierarchical Bandwidth Allocation for WAN Distributed Computing. In SIGCOMM'15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hongqiang Harry Liu, Srikanth Kandula, Ratul Mahajan, Ming Zhang, and David Gelernter. 2014. Traffic Engineering with Forward Fault Correction. In SIGCOMM'14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hongqiang Harry Liu, Xin Wu, Ming Zhang, Lihua Yuan, Roger Wattenhofer, and David Maltz. 2013. zUpdate: Updating Data Center Networks with Zero Loss. In SIGCOMM'13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ratul Mahajan and Roger Wattenhofer. 2013. On Consistent Updates in Software Defined Networks. In HotNets'13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. James McCauley, Zhi Liu, Aurojit Panda, Teemu Koponen, Barath Raghavan, Jennifer Rexford, and Scott Shenker. 2016. Recursive SDN for Carrier Networks. SIGCOMM Comput. Commun. Rev. 46, 4 (Dec. 2016), 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Charles E. Perkins. 1996. IP Encapsulation within IP. RFC 2003. (1996).Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Mark Reitblatt, Nate Foster, Jennifer Rexford, Cole Schlesinger, and David Walker. 2012. Abstractions for Network Update. In SIGCOMM'12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2015. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network. In SIGCOMM'15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Laurent Vanbever, Stefano Vissicchio, Cristel Pelsser, Pierre Francois, and Olivier Bonaventure. 2011. Seamless Network-wide IGP Migrations. In SIGCOMM'11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Arun Viswanathan, Eric C. Rosen, and Ross Callon. 2001. Multiprotocol Label Switching Architecture. RFC 3031. (2001).Google ScholarGoogle Scholar

Index Terms

  1. B4 and after: managing hierarchy, partitioning, and asymmetry for availability and scale in google's software-defined WAN

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGCOMM '18: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication
          August 2018
          604 pages
          ISBN:9781450355674
          DOI:10.1145/3230543

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 August 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate554of3,547submissions,16%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader