ABSTRACT
Private WANs are increasingly important to the operation of enterprises, telecoms, and cloud providers. For example, B4, Google's private software-defined WAN, is larger and growing faster than our connectivity to the public Internet. In this paper, we present the five-year evolution of B4. We describe the techniques we employed to incrementally move from offering best-effort content-copy services to carrier-grade availability, while concurrently scaling B4 to accommodate 100x more traffic. Our key challenge is balancing the tension introduced by hierarchy required for scalability, the partitioning required for availability, and the capacity asymmetry inherent to the construction and operation of any large-scale network. We discuss our approach to managing this tension: i) we design a custom hierarchical network topology for both horizontal and vertical software scaling, ii) we manage inherent capacity asymmetry in hierarchical topologies using a novel traffic engineering algorithm without packet encapsulation, and iii) we re-architect switch forwarding rules via two-stage matching/hashing to deal with asymmetric network failures at scale.
- 2017. Viptela Inc. http://viptela.com/. (2017).Google Scholar
- 2018. VeloCloud Networks, Inc. http://www.velocloud.com/. (2018).Google Scholar
- IEEE Standard 802.1Q. 2011. IEEE standard for local and metropolitan area networks-media access control (MAC) bridges and virtual bridged local area networks. (2011).Google Scholar
- Ed. A. Bashandy, C. Filsfils, and P. Mohapatra. 2018. BGP Prefix Independent Convergence. IETF Internet Draft. (2018).Google Scholar
- Ajay Kumar Bangla, Alireza Ghaffarkhah, Ben Preskill, Bikash Koley, Christoph Albrecht, Emilie Danna, Joe Jiang, and Xiaoxue Zhao. 2015. Capacity Planning for the Google Backbone Network. In International Symposium on Mathematical Programming (ISMP'15).Google Scholar
- Sebastian Brandt, Klaus-Tycho Foerster, and Roger Wattenhofer. 2016. On Consistent Migration of Flows in SDNs. In INFOCOM'16.Google Scholar
- Deborah Brungard, Malcolm Betts, Satoshi Ueno, Ben Niven-Jenkins, and Nurit Sprecher. 2009. Requirements of an MPLS transport profile. RFC 5654. (2009).Google Scholar
- Martin Casado, Teemu Koponen, Scott Shenker, and Amin Tootoonchian. 2012. Fabric: A Retrospective on Evolving SDN. In HotSDN'12. Google ScholarDigital Library
- James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2012. Spanner: Google's Globally-distributed Database. In OSDI'12. Google ScholarDigital Library
- Klaus-Tycho Foerster, Ratul Mahajan, and Roger Wattenhofer. 2016. Consistent Updates in Software Defined Networks: On Dependencies, Loop Freedom, and Blackholes. In IFIP Networking'16.Google Scholar
- Pierre Francois and Olivier Bonaventure. 2005. Avoiding Transient Loops during IGP convergence in IP Networks. In INFOCOM'05.Google ScholarCross Ref
- Soudeh Ghorbani and Matthew Caesar. 2012. Walk the Line: Consistent Network Updates with Bandwidth Guarantees. In HotSDN'12. Google ScholarDigital Library
- Ramesh Govindan, Ina Minei, Mahesh Kallahalla, Bikash Koley, and Amin Vahdat. 2016. Evolve or Die: High-Availability Design Principles Drawn from Google's Network Infrastructure. In SIGCOMM'16. Google ScholarDigital Library
- Victor Heorhiadi, Michael K. Reiter, and Vyas Sekar. 2016. Simplifying Software-defined Network Optimization Using SOL. In NSDI'16. Google ScholarDigital Library
- Thomas Holterbach, Stefano Vissicchio, Alberto Dainotti, and Laurent Vanbever. 2017. SWIFT: Predictive Fast Reroute. In SIGCOMM'17. Google ScholarDigital Library
- Chi-Yao Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, Mohan Nanduri, and Roger Wattenhofer. 2013. Achieving High Utilization with Software-driven WAN. In SIGCOMM'13. Google ScholarDigital Library
- Gianluca Iannaccone, Chen-nee Chuah, Richard Mortier, Supratik Bhattacharyya, and Christophe Diot. 2002. Analysis of Link Failures in an IP Backbone. In ACM SIGCOMM Workshop on Internet Measurment'02. Google ScholarDigital Library
- Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, Jon Zolla, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2013. B4: Experience with a Globally-deployed Software Defined WAN. In SIGCOMM'13. Google ScholarDigital Library
- Virajith Jalaparti, Ivan Bliznets, Srikanth Kandula, Brendan Lucier, and Ishai Menache. 2016. Dynamic Pricing and Traffic Engineering for Timely Inter-Datacenter Transfers. In SIGCOMM'16. Google ScholarDigital Library
- Mikel Jimenez and Henry Kwok. 2017. Building Express Backbone: Facebook's New Long-haul Network. https://code.facebook.com/posts/1782709872057497/. (2017).Google Scholar
- Xin Jin, Hongqiang Harry Liu, Rohan Gandhi, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Jennifer Rexford, and Roger Wattenhofer. 2014. Dynamic scheduling of Network Updates. In SIGCOMM'14. Google ScholarDigital Library
- Srikanth Kandula, Ishai Menache, Roy Schwartz, and Spandana Raj Babbula. 2014. Calendaring for Wide Area Networks. In SIGCOMM'14. Google ScholarDigital Library
- Nanxi Kang, Zhenming Liu, Jennifer Rexford, and David Walker. 2013. Optimizing the "One Big Switch" Abstraction in Software-defined Networks. In CoNEXT'13. Google ScholarDigital Library
- L. Kleinrock and F. Kamoun. 1977. Hierarchical Routing for Large Networks, Performance Evaluation and Optimization. Computer Networks 1, 3 (January 1977), 155--174.Google Scholar
- Alok Kumar, Sushant Jain, Uday Naik, Anand Raghuraman, Nikhil Kasinadhuni, Enrique Cauich Zermeno, C. Stephen Gunn, Jing Ai, Björn Carlin, Mihai Amarandei-Stavila, Mathieu Robin, Aspi Siganporia, Stephen Stuart, and Amin Vahdat. 2015. BwE: Flexible, Hierarchical Bandwidth Allocation for WAN Distributed Computing. In SIGCOMM'15. Google ScholarDigital Library
- Hongqiang Harry Liu, Srikanth Kandula, Ratul Mahajan, Ming Zhang, and David Gelernter. 2014. Traffic Engineering with Forward Fault Correction. In SIGCOMM'14. Google ScholarDigital Library
- Hongqiang Harry Liu, Xin Wu, Ming Zhang, Lihua Yuan, Roger Wattenhofer, and David Maltz. 2013. zUpdate: Updating Data Center Networks with Zero Loss. In SIGCOMM'13. Google ScholarDigital Library
- Ratul Mahajan and Roger Wattenhofer. 2013. On Consistent Updates in Software Defined Networks. In HotNets'13. Google ScholarDigital Library
- James McCauley, Zhi Liu, Aurojit Panda, Teemu Koponen, Barath Raghavan, Jennifer Rexford, and Scott Shenker. 2016. Recursive SDN for Carrier Networks. SIGCOMM Comput. Commun. Rev. 46, 4 (Dec. 2016), 1--7. Google ScholarDigital Library
- Charles E. Perkins. 1996. IP Encapsulation within IP. RFC 2003. (1996).Google ScholarDigital Library
- Mark Reitblatt, Nate Foster, Jennifer Rexford, Cole Schlesinger, and David Walker. 2012. Abstractions for Network Update. In SIGCOMM'12. Google ScholarDigital Library
- Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2015. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network. In SIGCOMM'15. Google ScholarDigital Library
- Laurent Vanbever, Stefano Vissicchio, Cristel Pelsser, Pierre Francois, and Olivier Bonaventure. 2011. Seamless Network-wide IGP Migrations. In SIGCOMM'11. Google ScholarDigital Library
- Arun Viswanathan, Eric C. Rosen, and Ross Callon. 2001. Multiprotocol Label Switching Architecture. RFC 3031. (2001).Google Scholar
Index Terms
- B4 and after: managing hierarchy, partitioning, and asymmetry for availability and scale in google's software-defined WAN
Recommendations
IPv4 Routing over IPv6 Routing Data Plane Using SRv6
SIET '23: Proceedings of the 8th International Conference on Sustainable Information Engineering and TechnologySegment Routing is a topic of interest in the field of networking and telecommunications, it is a source-based routing in a modern form rather than traditional routing. It comes with an innovative approach to integrating IPv4 routing and tunneling ...
Multicasting in MPLS domains
Explicit routing in MPLS is utilized in traffic engineering to maximize the operational network performance and to provide Quality of Service (QoS). However, difficulties arise while integrating native IP multicasting with MPLS traffic engineering, such ...
An IP network design algorithm for mixed traffic
MMACTEE'09: Proceedings of the 11th WSEAS international conference on Mathematical methods and computational techniques in electrical engineeringThe design of IP network that support traffic engineering for both unicast and multicast traffic is a very difficult problem. This paper proposes an IP network design algorithm called M-MENTOR that concern routing of both types of traffic. However, ...
Comments