Abstract
Datacenter transports aim to deliver low latency messaging together with high throughput. We show that simple packet delay, measured as round-trip times at hosts, is an effective congestion signal without the need for switch feedback. First, we show that advances in NIC hardware have made RTT measurement possible with microsecond accuracy, and that these RTTs are sufficient to estimate switch queueing. Then we describe how TIMELY can adjust transmission rates using RTT gradients to keep packet latency low while delivering high bandwidth. We implement our design in host software running over NICs with OS-bypass capabilities. We show using experiments with up to hundreds of machines on a Clos network topology that it provides excellent performance: turning on TIMELY for OS-bypass messaging over a fabric with PFC lowers 99 percentile tail latency by 9X while maintaining near line-rate throughput. Our system also outperforms DCTCP running in an optimized kernel, reducing tail latency by $13$X. To the best of our knowledge, TIMELY is the first delay-based congestion control protocol for use in the datacenter, and it achieves its results despite having an order of magnitude fewer RTT signals (due to NIC offload) than earlier delay-based schemes such as Vegas.
Supplemental Material
- Chelsio T5 Packet Rate Performance Report. http://goo.gl/3jJL6p, Pg 2.Google Scholar
- Data Center Bridging Task Group. http://www.ieee802.org/1/pages/dcbridges.html.Google Scholar
- Dual Port 10 Gigabit Server Adapter with Precision Time Stamping. http://goo.gl/VtL5oO.Google Scholar
- Gnuplot documentation. http://goo.gl/4sgrUU, Pg. 48.Google Scholar
- Mellanox for Linux. http://goo.gl/u44Xea.Google Scholar
- The NetFPGA Project. http://netfpga.org/.Google Scholar
- TSO Sizing and the FQ Scheduler. http://lwn.net/Articles/564978/.Google Scholar
- Using Hardware Timestamps with PF RING. http://goo.gl/oJtHCe, 2011.Google Scholar
- Who (Really) Needs Sub-microsecond Packet Timestamps? http://goo.gl/TI3r1u, 2013.Google Scholar
- A. Kabbani et al. AF-QCN: Approximate Fairness with Quantized Congestion Notification for Multi tenanted Data Centers. In Hot Interconnects'10. Google ScholarDigital Library
- A. Kabbani et al. FlowBender: Flow-level Adaptive Routing for Improved Latency and Throughput in Datacenter Networks. In ACM CoNEXT '14. Google ScholarDigital Library
- A. Singh et al. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter. In SIGCOMM'15. Google ScholarDigital Library
- Alizadeh et al. CONGA: Distributed Congestion aware Load Balancing for Datacenters. In SIGCOMM '14. Google ScholarDigital Library
- B. Stephens et al. Practical DCB for improved data center networks. In Infocom 2014.Google ScholarCross Ref
- B. Vamanan et al. Deadline-aware datacenter TCP (D2TCP). In SIGCOMM '12. Google ScholarDigital Library
- Brakmo et al. TCP Vegas: new techniques for congestion detection and avoidance. In SIGCOMM '94. Google ScholarDigital Library
- C. Lee et al. Accurate Latency-based Congestion Feedback for Datacenters. In USENIX ATC 15. Google ScholarDigital Library
- C.-Y. Hong et al. Finishing Flows Quickly with Preemptive Scheduling. In SIGCOMM '12. Google ScholarDigital Library
- D.-M. Chiu and R. Jain. Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks. Comput. Netw. ISDN Syst., 1989. Google ScholarDigital Library
- D. Zats et al. DeTail: Reducing the Flow Completion Time Tail in Datacenter Networks. In SIGCOMM '12. Google ScholarDigital Library
- J. Dean and L. A. Barroso. The Tail at Scale. Communications of the ACM, 56:74--80, 2013. Google ScholarDigital Library
- S. Floyd. TCP and explicit congestion notification. ACM SIGCOMM CCR, 24(5), 1994. Google ScholarDigital Library
- S. Floyd and V. Jacobson. Random early detection gateways for congestion avoidance. IEEE/ACM Trans. Netw., 1, August 1993. Google ScholarDigital Library
- I. Grigorik. Optimizing the Critical Rendering Path. http://goo.gl/DvFfGo, Velocity Conference 2013.Google Scholar
- D. A. Hayes and G. Armitage. Revisiting TCP Congestion Control using Delay Gradients. In Networking IFIP, 2011. Google ScholarDigital Library
- D. A. Hayes and D. Ros. Delay-based Congestion Control for Low Latency.Google Scholar
- C. Hollot, V. Misra, D. Towsley, and W.-B. Gong. A control theoretic analysis of RED. In IEEE Infocom '01.Google Scholar
- C. Hollot, V. Misra, D. Towsley, and W.-B. Gong. On designing improved controllers for AQM routers supporting TCP flows. In IEEE Infocom '01.Google Scholar
- IEEE. 802.1Qau - Congestion Notification. http://www.ieee802.org/1/pages/802.1au.html.Google Scholar
- J. Perry et al. Fastpass: A Centralized "Zero-Queue" Datacenter Network. In SIGCOMM '14. Google ScholarDigital Library
- R. Jain, D. Chiu, and W. Hawe. A Quantitative Measure of Fairness and Discrimination for Resource Allocation in Shared Computer Systems. In DEC Research Report TR-301, 1984.Google Scholar
- D. Katabi, M. Handley, and C. Rohrs. Internet Congestion Control for Future High Bandwidth-Delay Product Environments. In SIGCOMM'02. Google ScholarDigital Library
- F. P. Kelly, G. Raina, and T. Voice. Stability and fairness of explicit congestion control with small buffers. Computer Communication Review, 2008. Google ScholarDigital Library
- M. Al-Fares et al. A Scalable, Commodity Data Center Network Architecture. SIGCOMM '08. Google ScholarDigital Library
- M. Alizadeh et al. Data center TCP (DCTCP). In SIGCOMM '10. Google ScholarDigital Library
- M. Alizadeh et al. Data Center Transport Mechanisms: Congestion Control Theory and IEEE Standardization. In Annual Allerton Conference '08.Google Scholar
- M. Alizadeh et al. Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center. In NSDI '12. Google ScholarDigital Library
- M. Alizadeh et al. Deconstructing datacenter packet transport. In ACM HotNets, 2012. Google ScholarDigital Library
- N. Dukkipati et al. Processor Sharing Flows in the Internet. In IWQoS, 2005. Google ScholarDigital Library
- K. Nichols and V. Jacobson. Controlling queue delay. Queue, 10(5):20:20--20:34, May 2012. Google ScholarDigital Library
- J. Postel. Transmission Control Protocol. RFC 793, 1981. Updated by RFCs 1122, 3168, 6093, 6528.Google Scholar
- S. Ha et al. CUBIC: A New TCP-Friendly High-Speed TCP Variant. SIGOPS Operating System Review '08. Google ScholarDigital Library
- S. Radhakrishnan et al. SENIC: scalable NIC for end-host rate limiting. In NSDI 2014. Google ScholarDigital Library
- K. Tan and J. Song. A compound TCP approach for high-speed and long distance networks. In IEEE INFOCOM '06.Google Scholar
- V. Vasudevan et al. Safe and effective fine-grained TCP retransmissions for datacenter communication. In SIGCOMM '09. Google ScholarDigital Library
- D. X. Wei, C. Jin, S. H. Low, and S. Hegde. FAST TCP: Motivation, Architecture, Algorithms, Performance. IEEE/ACM Trans. Netw., 2006. Google ScholarDigital Library
- C. Wilson, H. Ballani, T. Karagiannis, and A. Rowtron. Better never than late: meeting deadlines in datacenter networks. In SIGCOMM '11. Google ScholarDigital Library
- Y. Zhu et al. Congestion Control for Large-Scale RDMA Deployments. In SIGCOMM 2015. Google ScholarDigital Library
Index Terms
- TIMELY: RTT-based Congestion Control for the Datacenter
Recommendations
Revisiting network support for RDMA
SIGCOMM '18: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data CommunicationThe advent of RoCE (RDMA over Converged Ethernet) has led to a significant increase in the use of RDMA in datacenter networks. To achieve good performance, RoCE requires a lossless network which is in turn achieved by enabling Priority Flow Control (PFC)...
TIMELY: RTT-based Congestion Control for the Datacenter
SIGCOMM '15: Proceedings of the 2015 ACM Conference on Special Interest Group on Data CommunicationDatacenter transports aim to deliver low latency messaging together with high throughput. We show that simple packet delay, measured as round-trip times at hosts, is an effective congestion signal without the need for switch feedback. First, we show ...
Congestion Control for Large-Scale RDMA Deployments
SIGCOMM '15: Proceedings of the 2015 ACM Conference on Special Interest Group on Data CommunicationModern datacenter applications demand high throughput (40Gbps) and ultra-low latency (< 10 μs per hop) from the network, with low CPU overhead. Standard TCP/IP stacks cannot meet these requirements, but Remote Direct Memory Access (RDMA) can. On IP-...
Comments