ABSTRACT
We present the design of Espresso, Google's SDN-based Internet peering edge routing infrastructure. This architecture grew out of a need to exponentially scale the Internet edge cost-effectively and to enable application-aware routing at Internet-peering scale. Espresso utilizes commodity switches and host-based routing/packet processing to implement a novel fine-grained traffic engineering capability. Overall, Espresso provides Google a scalable peering edge that is programmable, reliable, and integrated with global traffic systems. Espresso also greatly accelerated deployment of new networking features at our peering edge. Espresso has been in production for two years and serves over 22% of Google's total traffic to the Internet.
Supplemental Material
- 2010. GNU Quagga Project. www.nongnu.org/quagga/. (2010).Google Scholar
- 2013. Best Practices in Core Network Capacity Planning. White Paper. (2013).Google Scholar
- 2017. Prometheus - Monitoring system & time series database. https://prometheus.io/. (2017).Google Scholar
- Joo Taveira Arajo. 2016. Building and scaling the Fastly network, part 1: Fighting the FIB. https://www.fastly.com/blog/building-and-scaling-fastly-network-part-1-fighting-fib. (2016). [Online; posted on May 11, 2016].Google Scholar
- Ajay Kumar Bangla, Alireza Ghaffarkhah, Ben Preskill, Bikash Koley, Christoph Albrecht, Emilie Danna, Joe Jiang, and Xiaoxue Zhao. 2015. Capacity planning for the Google backbone network. In ISMP 2015 (International Symposium on Mathematical Programming).Google Scholar
- Mike Burrows. 2006. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th symposium on Operating systems design and implementation. USENIX Association, 335--350.Google ScholarDigital Library
- Matthew Caesar, Donald Caldwell, Nick Feamster, Jennifer Rexford, Aman Shaikh, and Jacobus van der Merwe. 2005. Design and Implementation of a Routing Control Platform. In Proceedings of the 2Nd Conference on Symposium on Networked Systems Design & Implementation - Volume 2 (NSDI'05). USENIX Association, Berkeley, CA, USA, 15--28. http://dl.acm.org/citation.cfm?id=1251203.1251205Google ScholarDigital Library
- Martin Casado, Michael J. Freedman, Justin Pettit, Jianying Luo, Nick McKeown, and Scott Shenker. 2007. Ethane: Taking Control of the Enterprise. SIGCOMM Comput. Commun. Rev. 37, 4 (Aug. 2007), 1--12. https://doi.org/10.1145/1282427.1282382 Google ScholarDigital Library
- Martin Casado, Teemu Koponen, Scott Shenker, and Amin Tootoonchian. 2012. Fabric: A Retrospective on Evolving SDN. In Proceedings of the First Workshop on Hot Topics in Software Defined Networks (HotSDN '12). ACM, New York, NY, USA, 85--90. https://doi.org/10.1145/2342441.2342459 Google ScholarDigital Library
- Florin Dobrian, Vyas Sekar, Asad Awan, Ion Stoica, Dilip Joseph, Aditya Ganjam, Jibin Zhan, and Hui Zhang. 2011. Understanding the Impact of Video Quality on User Engagement. In Proceedings of the ACM SIGCOMM 2011 Conference (SIGCOMM '11). ACM, New York, NY, USA, 362--373. https://doi.org/10.1145/2018436.2018478 Google ScholarDigital Library
- Sarah Edwards, Xuan Liu, and Niky Riga. 2015. Creating Repeatable Computer Science and Networking Experiments on Shared, Public Testbeds. SIGOPS Oper. Syst. Rev. 49, 1 (Jan. 2015), 90--99. https://doi.org/10.1145/2723872.2723884Google ScholarDigital Library
- Nick Feamster. 2016. Revealing Utilization at Internet Interconnection Points. CoRR abs/1603.03656 (2016). http://arxiv.org/abs/1603.03656Google Scholar
- Nick Feamster, Jay Borkenhagen, and Jennifer Rexford. 2003. Guidelines for interdomain traffic engineering. ACM SIGCOMM Computer Communication Review 33, 5 (2003), 19--30. Google ScholarDigital Library
- O. Filip. 2013. BIRD internet routing daemon. http://bird.network.cz/. (May 2013).Google Scholar
- Tobias Flach, Nandita Dukkipati, Andreas Terzis, Barath Raghavan, Neal Cardwell, Yuchung Cheng, Ankur Jain, Shuai Hao, Ethan Katz-Bassett, and Ramesh Govindan. 2013. Reducing Web Latency: the Virtue of Gentle Aggression. In Proceedings of the ACM Conference of the Special Interest Group on Data Communication (SIGCOMM '13). http://conferences.sigcomm.org/sigcomm/2013/papers/sigcomm/p159.pdfGoogle ScholarDigital Library
- Ramesh Govindan, Ina Minei, Mahesh Kallahalla, Bikash Koley, and Amin Vahdat. 2016. Evolve or Die: High-Availability Design Principles Drawn from Googles Network Infrastructure. In Proceedings of the 2016 Conference on ACM SIGCOMM 2016 Conference (SIGCOMM '16). ACM, New York, NY, USA, 58--72. https://doi.org/10.1145/2934872.2934891 Google ScholarDigital Library
- Arpit Gupta, Robert MacDavid, Rüdiger Birkner, Marco Canini, Nick Feamster, Jennifer Rexford, and Laurent Vanbever. 2016. An Industrial-scale Software Defined Internet Exchange Point. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation (NSDI'16). USENIX Association, Berkeley, CA, USA, 1--14. http://dl.acm.org/citation.cfm?id=2930611.2930612Google ScholarDigital Library
- Arpit Gupta, Laurent Vanbever, Muhammad Shahbaz, Sean Patrick Donovan, Brandon Schlinker, Nick Feamster, Jennifer Rexford, Scott Shenker, Russ Clark, and Ethan Katz-Bassett. 2014. SDX: A Software Defined Internet Exchange. SIGCOMM Comput. Commun. Rev. 44, 4 (Aug. 2014), 579--580. https://doi.org/10.1145/2740070.2631473 Google ScholarDigital Library
- Mark Handley, Orion Hodson, and Eddie Kohler. 2003. XORP: An Open Platform for Network Research. SIGCOMM Comput. Commun. Rev. 33, 1 (Jan. 2003), 53--57. https://doi.org/10.1145/774763.774771 Google ScholarDigital Library
- Chi-Yao Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, Mohan Nanduri, and Roger Wattenhofer. 2013. Achieving high utilization with software-driven WAN. In ACM SIGCOMM Computer Communication Review, Vol. 43. ACM, 15--26. Google ScholarDigital Library
- Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, et al. 2013. B4: Experience with a globally-deployed software defined WAN. ACM SIGCOMM 43, 4, 3--14. Google ScholarDigital Library
- Matthew K Mukerjee, David Naylor, Junchen Jiang, Dongsu Han, Srinivasan Seshan, and Hui Zhang. 2015. Practical, real-time centralized control for cdn-based live video delivery. ACM SIGCOMM Computer Communication Review 45, 4 (2015), 311--324.Google ScholarDigital Library
- Abhinav Pathak, Y Angela Wang, Cheng Huang, Albert Greenberg, Y Charlie Hu, Randy Kern, Jin Li, and Keith W Ross. 2010. Measuring and evaluating TCP splitting for cloud services. In International Conference on Passive and Active Network Measurement. Springer Berlin Heidelberg, 41--50.Google ScholarCross Ref
- Rachel Potvin and Josh Levenberg. 2016. Why Google Stores Billions of Lines of Code in a Single Repository. Commun. ACM 59, 7 (June 2016), 78--87. https://doi.org/10.1145/2854146 Google ScholarDigital Library
- Barath Raghavan, Martín Casado, Teemu Koponen, Sylvia Ratnasamy, Ali Ghodsi, and Scott Shenker. 2012. Software-defined Internet Architecture: Decoupling Architecture from Infrastructure. In Proceedings of the 11th ACM Workshop on Hot Topics in Networks (HotNets-XI). ACM, New York, NY, USA, 43--48. https://doi.org/10.1145/2390231.2390239 Google ScholarDigital Library
- S. Sangli, E. Chen, R. Fernando, J. Scudder, and Y. Rekhter. 2007. Graceful Restart Mechanism for BGP. RFC 4724 (Proposed Standard). (Jan. 2007). http://www.ietf.org/rfc/rfc4724.txtGoogle Scholar
- Brandon Schlinker, Hyojeong Kim, Timothy Chiu, Ethan Katz-Bassett, Harsha Madhyastha, Italo Cunha, James Quinn, Saif Hasan, Petr Lapukhov, and Hongyi Zeng. 2017. Engineering Egress with Edge Fabric. In Proceedings of the ACM SIGCOMM 2017 Conference (SIGCOMM '17). ACM, New York, NY, USA. Google ScholarDigital Library
- Tom Scholl. 2013. Building A Cheaper Peering Router. NANOG50. (2013). nLayer Communications, Inc.Google Scholar
- Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Hong Liu, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2016. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network. Commun. ACM 59, 9 (Aug. 2016), 88--97. https://doi.org/10.1145/2975159 Google ScholarDigital Library
- Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y. Wong, and Hongyi Zeng. 2016. Robotron: Top-down Network Management at Facebook Scale. In Proceedings of the 2016 Conference on ACM SIGCOMM 2016 Conference (SIGCOMM '16). ACM, New York, NY, USA, 426--439. https://doi.org/10.1145/2934872.2934874 Google ScholarDigital Library
- David E Taylor. 2005. Survey and taxonomy of packet classification techniques. ACM Computing Surveys (CSUR) 37, 3 (2005), 238--275.Google ScholarDigital Library
Index Terms
- Taking the Edge off with Espresso: Scale, Reliability and Programmability for Global Internet Peering
Recommendations
Espresso: Brewing Java For More Non-Volatility with Non-volatile Memory
ASPLOS '18Fast, byte-addressable non-volatile memory (NVM) embraces both near-DRAM latency and disk-like persistence, which has generated considerable interests to revolutionize system software stack and programming models. However, it is less understood how NVM ...
Espresso: Brewing Java For More Non-Volatility with Non-volatile Memory
ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating SystemsFast, byte-addressable non-volatile memory (NVM) embraces both near-DRAM latency and disk-like persistence, which has generated considerable interests to revolutionize system software stack and programming models. However, it is less understood how NVM ...
GridDL: an HTTP bandwidth sharing framework
U-NET '09: Proceedings of the 1st ACM workshop on User-provided networking: challenges and opportunitiesPeer-to-peer (P2P) applications have become a mainstream technology for content distribution. Yet widely used P2P applications, such as BitTorrent and Gnutella, suffer from flaws that are currently open topics of research---from the problem of ...
Comments