skip to main content
10.1145/3098822.3098854acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Open Access

Taking the Edge off with Espresso: Scale, Reliability and Programmability for Global Internet Peering

Published:07 August 2017Publication History

ABSTRACT

We present the design of Espresso, Google's SDN-based Internet peering edge routing infrastructure. This architecture grew out of a need to exponentially scale the Internet edge cost-effectively and to enable application-aware routing at Internet-peering scale. Espresso utilizes commodity switches and host-based routing/packet processing to implement a novel fine-grained traffic engineering capability. Overall, Espresso provides Google a scalable peering edge that is programmable, reliable, and integrated with global traffic systems. Espresso also greatly accelerated deployment of new networking features at our peering edge. Espresso has been in production for two years and serves over 22% of Google's total traffic to the Internet.

Skip Supplemental Material Section

Supplemental Material

takingtheedgeoffwithespressoscalereliabilityandprogrammabilityforglobalinternetpeering.webm

webm

84.5 MB

References

  1. 2010. GNU Quagga Project. www.nongnu.org/quagga/. (2010).Google ScholarGoogle Scholar
  2. 2013. Best Practices in Core Network Capacity Planning. White Paper. (2013).Google ScholarGoogle Scholar
  3. 2017. Prometheus - Monitoring system & time series database. https://prometheus.io/. (2017).Google ScholarGoogle Scholar
  4. Joo Taveira Arajo. 2016. Building and scaling the Fastly network, part 1: Fighting the FIB. https://www.fastly.com/blog/building-and-scaling-fastly-network-part-1-fighting-fib. (2016). [Online; posted on May 11, 2016].Google ScholarGoogle Scholar
  5. Ajay Kumar Bangla, Alireza Ghaffarkhah, Ben Preskill, Bikash Koley, Christoph Albrecht, Emilie Danna, Joe Jiang, and Xiaoxue Zhao. 2015. Capacity planning for the Google backbone network. In ISMP 2015 (International Symposium on Mathematical Programming).Google ScholarGoogle Scholar
  6. Mike Burrows. 2006. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th symposium on Operating systems design and implementation. USENIX Association, 335--350.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Matthew Caesar, Donald Caldwell, Nick Feamster, Jennifer Rexford, Aman Shaikh, and Jacobus van der Merwe. 2005. Design and Implementation of a Routing Control Platform. In Proceedings of the 2Nd Conference on Symposium on Networked Systems Design & Implementation - Volume 2 (NSDI'05). USENIX Association, Berkeley, CA, USA, 15--28. http://dl.acm.org/citation.cfm?id=1251203.1251205Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Martin Casado, Michael J. Freedman, Justin Pettit, Jianying Luo, Nick McKeown, and Scott Shenker. 2007. Ethane: Taking Control of the Enterprise. SIGCOMM Comput. Commun. Rev. 37, 4 (Aug. 2007), 1--12. https://doi.org/10.1145/1282427.1282382 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Martin Casado, Teemu Koponen, Scott Shenker, and Amin Tootoonchian. 2012. Fabric: A Retrospective on Evolving SDN. In Proceedings of the First Workshop on Hot Topics in Software Defined Networks (HotSDN '12). ACM, New York, NY, USA, 85--90. https://doi.org/10.1145/2342441.2342459 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Florin Dobrian, Vyas Sekar, Asad Awan, Ion Stoica, Dilip Joseph, Aditya Ganjam, Jibin Zhan, and Hui Zhang. 2011. Understanding the Impact of Video Quality on User Engagement. In Proceedings of the ACM SIGCOMM 2011 Conference (SIGCOMM '11). ACM, New York, NY, USA, 362--373. https://doi.org/10.1145/2018436.2018478 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sarah Edwards, Xuan Liu, and Niky Riga. 2015. Creating Repeatable Computer Science and Networking Experiments on Shared, Public Testbeds. SIGOPS Oper. Syst. Rev. 49, 1 (Jan. 2015), 90--99. https://doi.org/10.1145/2723872.2723884Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Nick Feamster. 2016. Revealing Utilization at Internet Interconnection Points. CoRR abs/1603.03656 (2016). http://arxiv.org/abs/1603.03656Google ScholarGoogle Scholar
  13. Nick Feamster, Jay Borkenhagen, and Jennifer Rexford. 2003. Guidelines for interdomain traffic engineering. ACM SIGCOMM Computer Communication Review 33, 5 (2003), 19--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. O. Filip. 2013. BIRD internet routing daemon. http://bird.network.cz/. (May 2013).Google ScholarGoogle Scholar
  15. Tobias Flach, Nandita Dukkipati, Andreas Terzis, Barath Raghavan, Neal Cardwell, Yuchung Cheng, Ankur Jain, Shuai Hao, Ethan Katz-Bassett, and Ramesh Govindan. 2013. Reducing Web Latency: the Virtue of Gentle Aggression. In Proceedings of the ACM Conference of the Special Interest Group on Data Communication (SIGCOMM '13). http://conferences.sigcomm.org/sigcomm/2013/papers/sigcomm/p159.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ramesh Govindan, Ina Minei, Mahesh Kallahalla, Bikash Koley, and Amin Vahdat. 2016. Evolve or Die: High-Availability Design Principles Drawn from Googles Network Infrastructure. In Proceedings of the 2016 Conference on ACM SIGCOMM 2016 Conference (SIGCOMM '16). ACM, New York, NY, USA, 58--72. https://doi.org/10.1145/2934872.2934891 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Arpit Gupta, Robert MacDavid, Rüdiger Birkner, Marco Canini, Nick Feamster, Jennifer Rexford, and Laurent Vanbever. 2016. An Industrial-scale Software Defined Internet Exchange Point. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation (NSDI'16). USENIX Association, Berkeley, CA, USA, 1--14. http://dl.acm.org/citation.cfm?id=2930611.2930612Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Arpit Gupta, Laurent Vanbever, Muhammad Shahbaz, Sean Patrick Donovan, Brandon Schlinker, Nick Feamster, Jennifer Rexford, Scott Shenker, Russ Clark, and Ethan Katz-Bassett. 2014. SDX: A Software Defined Internet Exchange. SIGCOMM Comput. Commun. Rev. 44, 4 (Aug. 2014), 579--580. https://doi.org/10.1145/2740070.2631473 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mark Handley, Orion Hodson, and Eddie Kohler. 2003. XORP: An Open Platform for Network Research. SIGCOMM Comput. Commun. Rev. 33, 1 (Jan. 2003), 53--57. https://doi.org/10.1145/774763.774771 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Chi-Yao Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, Mohan Nanduri, and Roger Wattenhofer. 2013. Achieving high utilization with software-driven WAN. In ACM SIGCOMM Computer Communication Review, Vol. 43. ACM, 15--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, et al. 2013. B4: Experience with a globally-deployed software defined WAN. ACM SIGCOMM 43, 4, 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Matthew K Mukerjee, David Naylor, Junchen Jiang, Dongsu Han, Srinivasan Seshan, and Hui Zhang. 2015. Practical, real-time centralized control for cdn-based live video delivery. ACM SIGCOMM Computer Communication Review 45, 4 (2015), 311--324.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Abhinav Pathak, Y Angela Wang, Cheng Huang, Albert Greenberg, Y Charlie Hu, Randy Kern, Jin Li, and Keith W Ross. 2010. Measuring and evaluating TCP splitting for cloud services. In International Conference on Passive and Active Network Measurement. Springer Berlin Heidelberg, 41--50.Google ScholarGoogle ScholarCross RefCross Ref
  24. Rachel Potvin and Josh Levenberg. 2016. Why Google Stores Billions of Lines of Code in a Single Repository. Commun. ACM 59, 7 (June 2016), 78--87. https://doi.org/10.1145/2854146 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Barath Raghavan, Martín Casado, Teemu Koponen, Sylvia Ratnasamy, Ali Ghodsi, and Scott Shenker. 2012. Software-defined Internet Architecture: Decoupling Architecture from Infrastructure. In Proceedings of the 11th ACM Workshop on Hot Topics in Networks (HotNets-XI). ACM, New York, NY, USA, 43--48. https://doi.org/10.1145/2390231.2390239 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Sangli, E. Chen, R. Fernando, J. Scudder, and Y. Rekhter. 2007. Graceful Restart Mechanism for BGP. RFC 4724 (Proposed Standard). (Jan. 2007). http://www.ietf.org/rfc/rfc4724.txtGoogle ScholarGoogle Scholar
  27. Brandon Schlinker, Hyojeong Kim, Timothy Chiu, Ethan Katz-Bassett, Harsha Madhyastha, Italo Cunha, James Quinn, Saif Hasan, Petr Lapukhov, and Hongyi Zeng. 2017. Engineering Egress with Edge Fabric. In Proceedings of the ACM SIGCOMM 2017 Conference (SIGCOMM '17). ACM, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Tom Scholl. 2013. Building A Cheaper Peering Router. NANOG50. (2013). nLayer Communications, Inc.Google ScholarGoogle Scholar
  29. Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Hong Liu, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2016. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network. Commun. ACM 59, 9 (Aug. 2016), 88--97. https://doi.org/10.1145/2975159 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y. Wong, and Hongyi Zeng. 2016. Robotron: Top-down Network Management at Facebook Scale. In Proceedings of the 2016 Conference on ACM SIGCOMM 2016 Conference (SIGCOMM '16). ACM, New York, NY, USA, 426--439. https://doi.org/10.1145/2934872.2934874 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. David E Taylor. 2005. Survey and taxonomy of packet classification techniques. ACM Computing Surveys (CSUR) 37, 3 (2005), 238--275.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Taking the Edge off with Espresso: Scale, Reliability and Programmability for Global Internet Peering

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGCOMM '17: Proceedings of the Conference of the ACM Special Interest Group on Data Communication
        August 2017
        515 pages
        ISBN:9781450346535
        DOI:10.1145/3098822

        Copyright © 2017 Owner/Author

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 August 2017

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate554of3,547submissions,16%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader