ABSTRACT
Large content providers build points of presence around the world, each connected to tens or hundreds of networks. Ideally, this connectivity lets providers better serve users, but providers cannot obtain enough capacity on some preferred peering paths to handle peak traffic demands. These capacity constraints, coupled with volatile traffic and performance and the limitations of the 20 year old BGP protocol, make it difficult to best use this connectivity.
We present Edge Fabric, an SDN-based system we built and deployed to tackle these challenges for Facebook, which serves over two billion users from dozens of points of presence on six continents. We provide the first public details on the connectivity of a provider of this scale, including opportunities and challenges. We describe how Edge Fabric operates in near real-time to avoid congesting links at the edge of Facebook's network. Our evaluation on production traffic worldwide demonstrates that Edge Fabric efficiently uses interconnections without congesting them and degrading performance. We also present real-time performance measurements of available routes and investigate incorporating them into routing decisions. We relate challenges, solutions, and lessons from four years of operating and evolving Edge Fabric.
Supplemental Material
- ThousandEyes: Network Intelligence Software. www.thousandeyes.com.Google Scholar
- B. Ager, N. Chatzis, A. Feldmann, N. Sarrar, S. Uhlig, and W. Willinger. Anatomy of a Large European IXP. In Proc. ACM SIGCOMM, 2012. Google ScholarDigital Library
- M. Calder, X. Fan, Z. Hu, E. Katz-Bassett, J. Heidemann, and R. Govindan. Mapping the Expansion of Google's Serving Infrastructure. In Proc. ACM IMC, 2013. Google ScholarDigital Library
- M. Calder, A. Flavel, E. Katz-Bassett, R. Mahajan, and J. Padhye. Analyzing the Performance of an Anycast CDN. In Proc. ACM IMC, 2015. Google ScholarDigital Library
- Y.-C. Chiu, B. Schlinker, A. B. Radhakrishnan, E. Katz-Bassett, and R. Govindan. Are We One Hop Away from a Better Internet?. In Proc. ACM IMC, 2015. Google ScholarDigital Library
- F. Dobrian, V. Sekar, A. Awan, I. Stoica, D. Joseph, A. Ganjam, J. Zhan, and H. Zhang. Understanding the Impact of Video Quality on User Engagement. In Proc. ACM SIGCOMM, 2011. Google ScholarDigital Library
- M. T. Fangfei Chen, Ramesh K. Sitaraman. End-User Mapping: Next Generation Request Routing for Content Delivery. In Proc. ACM SIGCOMM, 2015.Google Scholar
- N. Feamster. 2016. Revealing Utilization at Internet Interconnection Points. CoRR abs/1603.03656 (2016).Google Scholar
- T. Flach, N. Dukkipati, A. Terzis, B. Raghavan, N. Cardwell, Y. Cheng, A. Jain, S. Hao, E. Katz-Bassett, and R. Govindan. Reducing Web Latency: The Virtue of Gentle Aggression. In Proc. ACM SIGCOMM, 2013. Google ScholarDigital Library
- T. Flach, P. Papageorge, A. Terzis, L. D. Pedrosa, Y. Cheng, T. Karim, E. Katz-Bassett, and R. Govindan. An Internet-Wide Analysis of Traffic Policing. In Proc. ACM SIGCOMM, 2016. Google ScholarDigital Library
- A. Flavel, P. Mani, D. Maltz, N. Holt, J. Liu, Y. Chen, and O. Surmachev. FastRoute: A Scalable Load-Aware Anycast Routing Architecture for Modern CDNs. In Proc. USENIX NSDI, 2015.Google ScholarDigital Library
- A. Gupta, R. MacDavid, R. Birkner, M. Canini, N. Feamster, J. Rexford, and L. Vanbever. An Industrial-scale Software Defined Internet Exchange Point. In Proc. USENIX NSDI, 2016.Google Scholar
- C.-Y. Hong, S. Kandula, R. Mahajan, M. Zhang, V. Gill, M. Nanduri, and R. Wattenhofer. Achieving High Utilization with Software-driven WAN. In Proc. ACM SIGCOMM, 2013. Google ScholarDigital Library
- S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu, J. Zolla, U. Hölzle, S. Stuart, and A. Vahdat. B4: Experience with a Globally-deployed Software Defined Wan. In Proc. ACM SIGCOMM, 2013. Google ScholarDigital Library
- C. Labovitz, S. Iekel-Johnson, D. McPherson, J. Oberheide, and F. Jahanian. Internet Inter-domain Traffic. In Proc. ACM SIGCOMM, 2010. Google ScholarDigital Library
- H. H. Liu, R. Viswanathan, M. Calder, A. Akella, R. Mahajan, J. Padhye, and M. Zhang. Efficiently Delivering Online Services over Integrated Infrastructure. In Proc. USENIX NSDI, 2016.Google Scholar
- Z. Liu, A. Manousis, G. Vorsanger, V. Sekar, and V. Braverman. One Sketch to Rule Them All: Rethinking Network Flow Monitoring with UnivMon. In Proc. ACM SIGCOMM, 2016. Google ScholarDigital Library
- M. Luckie, B. Huffaker, K. Claffy, A. Dhamdhere, and V. Giotsas. AS Relationships, Customer Cones, and Validation. In Proc. ACM IMC, 2013. Google ScholarDigital Library
- H. Madhyastha, T. Isdal, M. Piatek, C. Dixon, T. Anderson, A. Krishnamurthy, and A. Venkataramani. iPlane: an Information Plane for Distributed Services. In Proc. USENIX OSDI, 2006.Google ScholarDigital Library
- S. Meinders. In RIPE NCC Regional Meeting: Eurasia Network Operators Group (ENOG 11), 2016.Google Scholar
- A. Nikravesh, H. Yao, S. Xu, D. Choffnes, and Z. M. Mao. Mobilyzer: An Open Platform for Controllable Mobile Network Measurements. In Proc. ACM MobiSys, 2015. Google ScholarDigital Library
- R. Sambasivan, D. Tran-Lam, A. Akella, and P. Steenkiste. Bootstrapping Evolvability for Inter-domain Routing with D-BGP. In Proc. ACM SIGCOMM, 2017. Google ScholarDigital Library
- Sandvine. Global Internet Phenomena Report 2H2016. Available at: http://www.sandvine.com/trends/global-internet-phenomena.Google Scholar
- B. Schlinker, K. Zarifis, I. Cunha, N. Feamster, and E. Katz-Bassett. PEERING: An AS for Us. In Proc. ACM HotNets, 2014. Google ScholarDigital Library
- J. Scudder, R. Fernando, and S. Stuart. RFC 7854: BGP Monitoring Protocol (BMP). http://www.ietf.org/rfc/rfc7854.txt.Google Scholar
- D. Sommermann and A. Frindell. Introducing Proxygen, Facebook's C++ HTTP framework. https://code.facebook.com/posts/1503205539947302.Google Scholar
- Y.-W. E. Sung, X. Tie, S. H. Wong, and H. Zeng. Robotron: Top-down Network Management at Facebook Scale. In Proc. ACM SIGCOMM, 2016. Google ScholarDigital Library
- V. Valancius, B. Ravi, N. Feamster, and A. C. Snoeren. Quantifying the Benefits of Joint Content and Network Routing. In Proc. ACM SIGMETRICS, 2013. Google ScholarDigital Library
- S. Vissicchio, O. Tilmans, L. Vanbever, and J. Rexford. Central Control Over Distributed Routing. In Proc. ACM SIGCOMM, 2015. Google ScholarDigital Library
- D. Wing and A. Yourtchenko. RFC 6555 Happy Eyeballs: Success with Dual-Stack Hosts. http://www.ietf.org/rfc/rfc6555.txt.Google Scholar
- K. K. Yap, M. Motiwala, J. Rahe, S. Padgett, M. Holliman, G. Baldus, M. Hines, T. Kim, A. Narayanan, A. Jain, V. Lin, C. Rice, B. Rogan, A. Singh, B. Tanaka, M. Verma, P. Sood, M. Tariq, M. Tierney, D. Trumic, V. Valancius, C. Ying, M. Kallahalla, B. Koley, and A. Vahdat. Taking the Edge off with Espresso: Scale, Reliability and Programmability for Global Internet Peering. In Proc. ACM SIGCOMM, 2017. Google ScholarDigital Library
- Z. Zhang, M. Zhang, A. Greenberg, Y. C. Hu, R. Mahajan, and B. Christian. Optimizing Cost and Performance in Online Service Provider Networks. In Proc. USENIX NSDI, 2010.Google Scholar
- J. Zhou, M. Tewari, M. Zhu, A. Kabbani, L. Poutievski, A. Singh, and A. Vahdat. WCMP: Weighted Cost Multipathing for Improved Fairness in Data Centers. In Proc. ACM EuroSys, 2014. Google ScholarDigital Library
Index Terms
- Engineering Egress with Edge Fabric: Steering Oceans of Content to the World
Recommendations
Rehoming edge links for better traffic engineering
Traditional traffic engineering adapts the routing of traffic within the network to maximize performance. We propose a new approach that also adaptively changes where traffic enters and leaves the network---changing the "traffic matrix", and not just ...
Observing BGP route poisoning in the wild
SIGCOMM '20: Proceedings of the SIGCOMM '20 Poster and Demo SessionsOn the Internet, Border Gateway Protocol (BGP) is the standard to construct inter-domain routes among autonomous systems (ASes). Data traffic follows the inverse direction of BGP route propagation. For the outbound traffic, an AS can make its own ...
Invited A new traffic engineering manager for DiffServ/MPLS networks: design and implementation on an IP QoS Testbed
In a multi-service network, different applications have varying QoS requirements. The IETF has proposed the DiffServ architecture as a scalable solution to provide Quality of Service (QoS) in IP Networks. In order to provide quantitative guarantees and ...
Comments