ABSTRACT
Datacenter networks employ multi-rooted topologies (e.g., Leaf-Spine, Fat-Tree) to provide large bisection bandwidth. These topologies use a large degree of multipathing, and need a data-plane load-balancing mechanism to effectively utilize their bisection bandwidth. The canonical load-balancing mechanism is equal-cost multi-path routing (ECMP), which spreads traffic uniformly across multiple paths. Motivated by ECMP's shortcomings, congestion-aware load-balancing techniques such as CONGA have been developed. These techniques have two limitations. First, because switch memory is limited, they can only maintain a small amount of congestion-tracking state at the edge switches, and do not scale to large topologies. Second, because they are implemented in custom hardware, they cannot be modified in the field.
This paper presents HULA, a data-plane load-balancing algorithm that overcomes both limitations. First, instead of having the leaf switches track congestion on all paths to a destination, each HULA switch tracks congestion for the best path to a destination through a neighboring switch. Second, we design HULA for emerging programmable switches and program it in P4 to demonstrate that HULA could be run on such programmable chipsets, without requiring custom hardware. We evaluate HULA extensively in simulation, showing that it outperforms a scalable extension to CONGA in average flow completion time (1.6 x at 50% load, 3 x at 90% load).
- N. Kang, Z. Liu, J. Rexford, and D. Walker, "Optimizing the "one big switch" abstraction in software-defined networks," CoNEXT '13, (New York, NY, USA), ACM. Google ScholarDigital Library
- M. Alizadeh and T. Edsall, "On the data path performance of leaf-spine datacenter fabrics," in HotInterconnects 2013, pp. 71--74. Google ScholarDigital Library
- J. Perry, A. Ousterhout, H. Balakrishnan, D. Shah, and H. Fugal, "Fastpass: A centralized "zero-queue" datacenter network," SIGCOMM, 2014, (New York, NY, USA), pp. 307--318, ACM. Google ScholarDigital Library
- V. Jeyakumar, M. Alizadeh, D. Mazières, B. Prabhakar, C. Kim, and A. Greenberg, "Eyeq: Practical network performance isolation at the edge," NSDI 2013, (Berkeley, CA, USA), pp. 297--312, USENIX Association. Google ScholarDigital Library
- L. Popa, A. Krishnamurthy, S. Ratnasamy, and I. Stoica, "Faircloud: Sharing the network in cloud computing," HotNets-X, (New York, NY, USA), pp. 22:1--22:6, ACM, 2011. Google ScholarDigital Library
- M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker, "pfabric: Minimal near-optimal datacenter transport," SIGCOMM 2013, (New York, NY, USA), pp. 435--446, ACM. Google ScholarDigital Library
- M. Chowdhury, Y. Zhong, and I. Stoica, "Efficient coflow scheduling with varys," in Proceedings of the 2014 ACM Conference on SIGCOMM, SIGCOMM '14, (New York, NY, USA), pp. 443--454, ACM, 2014. Google ScholarDigital Library
- M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat, "Hedera: Dynamic flow scheduling for data center networks," NSDI 2010, (Berkeley, CA, USA), pp. 19--19, USENIX Association. Google ScholarDigital Library
- T. Benson, A. Anand, A. Akella, and M. Zhang, "Microte: Fine grained traffic engineering for data centers," CoNEXT 2011, pp. 8:1--8:12, ACM. Google ScholarDigital Library
- J. Cao, R. Xia, P. Yang, C. Guo, G. Lu, L. Yuan, Y. Zheng, H. Wu, Y. Xiong, and D. Maltz, "Per-packet load-balanced, low-latency routing for clos-based data center networks," CoNEXT 2013, pp. 49--60, ACM. Google ScholarDigital Library
- S. Kandula, D. Katabi, S. Sinha, and A. Berger, "Dynamic load balancing without packet reordering," SIGCOMM Comput. Commun. Rev., vol. 37, pp. 51--62, Mar. 2007. Google ScholarDigital Library
- S. Sen, D. Shue, S. Ihm, and M. J. Freedman, "Scalable, optimal flow routing in datacenters via local link balancing," CoNEXT 2013, pp. 151--162, ACM. Google ScholarDigital Library
- M. Alizadeh, T. Edsall, S. Dharmapurikar, R. Vaidyanathan, K. Chu, A. Fingerhut, V. T. Lam, F. Matus, R. Pan, N. Yadav, and G. Varghese, "Conga: Distributed congestion-aware load balancing for datacenters," SIGCOMM Comput. Commun. Rev., vol. 44, pp. 503--514, Aug. 2014. Google ScholarDigital Library
- C.-Y. Hong, S. Kandula, R. Mahajan, M. Zhang, V. Gill, M. Nanduri, and R. Wattenhofer, "Achieving high utilization with software-driven wan," SIGCOMM 2013, pp. 15--26, ACM. Google ScholarDigital Library
- S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu, J. Zolla, U. Hölzle, S. Stuart, and A. Vahdat, "B4: Experience with a globally-deployed software defined wan," SIGCOMM 2013, pp. 3--14, ACM. Google ScholarDigital Library
- P. Bosshart, G. Gibb, H.-S. Kim, G. Varghese, N. McKeown, M. Izzard, F. Mujica, and M. Horowitz, "Forwarding Metamorphosis: Fast Programmable Match-action Processing in Hardware for SDN," in SIGCOMM, 2013. Google ScholarDigital Library
- "Intel FlexPipe." http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/ethernet-switch-fm6000-series-brief.pdf.Google Scholar
- "Cavium and XPliant introduce a fully programmable switch silicon family scaling to 3.2 terabits per second." http://tinyurl.com/nzbqtr3.Google Scholar
- P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, and D. Walker, "P4: Programming protocol-independent packet processors," SIGCOMM Comput. Commun. Rev., vol. 44, pp. 87--95, July 2014. Google ScholarDigital Library
- T. Issariyakul and E. Hossain, Introduction to Network Simulator NS2. Springer Publishing Company, Incorporated, 1st ed., 2010. Google ScholarDigital Library
- R. Niranjan Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, and A. Vahdat, "Portland: A scalable fault-tolerant layer 2 data center network fabric," SIGCOMM 2009, pp. 39--50, ACM. Google ScholarDigital Library
- "Cisco's massively scalable data center." http://www.cisco.com/c/dam/en/us/td/docs/solutions/Enterprise/Data_Center/MSDC/1-0/MSDC_AAG_1.pdf, Sept 2015.Google Scholar
- "High Capacity StrataXGS®Trident II Ethernet Switch Series." http://www.broadcom.com/products/Switching/Data-Center/BCM56850-Series.Google Scholar
- S. Hu, K. Chen, H. Wu, W. Bai, C. Lan, H. Wang, H. Zhao, and C. Guo, "Explicit path control in commodity data centers: Design and applications," NSDI 2015, pp. 15--28, USENIX Association. Google ScholarDigital Library
- A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta, "Vl2: A scalable and flexible data center network," SIGCOMM Comput. Commun. Rev., vol. 39, pp. 51--62, Aug. 2009. Google ScholarDigital Library
- C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu, "Bcube: A high performance, server-centric network architecture for modular data centers," SIGCOMM 2009, pp. 63--74, ACM. Google ScholarDigital Library
- E. Athanasopoulou, L. X. Bui, T. Ji, R. Srikant, and A. Stolyar, "Back-pressure-based packet-by-packet adaptive routing in communication networks," IEEE/ACM Trans. Netw., vol. 21, pp. 244--257, Feb. 2013. Google ScholarDigital Library
- B. Awerbuch and T. Leighton, "A simple local-control approximation algorithm for multicommodity flow," pp. 459--468, 1993. Google ScholarDigital Library
- "P4 Specification." http://p4.org/wp-content/uploads/2015/11/p4-v1.1rc-Nov-17.pdf.Google Scholar
- S. Radhakrishnan, M. Tewari, R. Kapoor, G. Porter, and A. Vahdat, "Dahu: Commodity switches for direct connect data center networks," ANCS 2013, pp. 59--70, IEEE Press. Google ScholarDigital Library
- A. Sivaraman, M. Budiu, A. Cheung, C. Kim, S. Licking, G. Varghese, H. Balakrishnan, M. Alizadeh, and N. McKeown, "Packet transactions: A programming model for data-plane algorithms at hardware speed," CoRR, vol. abs/1512.05023, 2015.Google Scholar
- "Protocol-independent switch architecture." http://schd.ws/hosted_files/p4workshop2015/c9/NickM-P4-Workshop-June-04-2015.pdf.Google Scholar
- "Members of the p4 consortium." http://p4.org/join-us/.Google Scholar
- "P4's action-execution semantics and conditional operators." https://github.com/anirudhSK/p4-semantics/raw/master/p4-semantics.pdf.Google Scholar
- Private communication with the authors of CONGA.Google Scholar
- M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan, "Data center tcp (dctcp)," SIGCOMM 2010, pp. 63--74, ACM. Google ScholarDigital Library
- K. He, E. Rozner, K. Agarwal, W. Felter, J. Carter, and A. Akella, "Presto: Edge-based load balancing for fast datacenter networks," in SIGCOMM, 2015. Google ScholarDigital Library
- C. Raiciu, S. Barre, C. Pluntke, A. Greenhalgh, D. Wischik, and M. Handley, "Improving datacenter performance and robustness with multipath tcp," SIGCOMM 2011, pp. 266--277, ACM. Google ScholarDigital Library
- W. Bai, L. Chen, K. Chen, D. Han, C. Tian, and H. Wang, "Information-agnostic flow scheduling for commodity data centers," NSDI 2015, pp. 455--468, USENIX Association. Google ScholarDigital Library
- D. Zats, T. Das, P. Mohan, D. Borthakur, and R. Katz, "Detail: Reducing the flow completion time tail in datacenter networks," SIGCOMM 2012, pp. 139--150, ACM. Google ScholarDigital Library
- S. Kandula, D. Katabi, B. Davie, and A. Charny, "Walking the tightrope: Responsive yet stable traffic engineering," SIGCOMM 2005, pp. 253--264, ACM. Google ScholarDigital Library
- A. Elwalid, C. Jin, S. Low, and I. Widjaja, "Mate: Mpls adaptive traffic engineering," in IEEE INFOCOM 2001, pp. 1300--1309 vol. 3.Google Scholar
- N. Mchael and A. Tang, "Halo: Hop-by-hop adaptive link-state optimal routing," Networking, IEEE/ACM Transactions on, vol. PP, no. 99, pp. 1--1, 2014.Google Scholar
- R. Gallager, "A minimum delay routing algorithm using distributed computation," Communications, IEEE Transactions on, vol. 25, pp. 73--85, Jan 1977.Google ScholarCross Ref
- HULA: Scalable Load Balancing Using Programmable Data Planes
Recommendations
SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs
SIGCOMM '17: Proceedings of the Conference of the ACM Special Interest Group on Data CommunicationIn this paper, we show that up to hundreds of software load balancer (SLB) servers can be replaced by a single modern switching ASIC, potentially reducing the cost of load balancing by over two orders of magnitude. Today, large data centers typically ...
MP-HULA: Multipath Transport Aware Load Balancing Using Programmable Data Planes
NetCompute '18: Proceedings of the 2018 Morning Workshop on In-Network ComputingDatacenter networks offer a large degree of multipath in order to provide large bisectional bandwidth. The end-to-end performance is determined by the load-balancing strategy which needs to be designed to effectively manage congestion. Consequently, ...
Efficient congestion avoidance mechanism
LCN '00: Proceedings of the 25th Annual IEEE Conference on Local Computer NetworksIncreasing uncontrolled best-effort traffic deteriorates the ability of TCP to control congestion and is a source of high drop rates. This paper proposes an efficient congestion avoidance mechanism (ECAM) suitable for uncontrolled unicast and multicast ...
Comments