skip to main content
10.1145/3098822.3098825acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free Access

Re-architecting datacenter networks and stacks for low latency and high performance

Published:07 August 2017Publication History

ABSTRACT

Modern datacenter networks provide very high capacity via redundant Clos topologies and low switch latency, but transport protocols rarely deliver matching performance. We present NDP, a novel data-center transport architecture that achieves near-optimal completion times for short transfers and high flow throughput in a wide range of scenarios, including incast. NDP switch buffers are very shallow and when they fill the switches trim packets to headers and priority forward the headers. This gives receivers a full view of instantaneous demand from all senders, and is the basis for our novel, high-performance, multipath-aware transport protocol that can deal gracefully with massive incast events and prioritize traffic from different senders on RTT timescales. We implemented NDP in Linux hosts with DPDK, in a software switch, in a NetFPGA-based hardware switch, and in P4. We evaluate NDP's performance in our implementations and in large-scale simulations, simultaneously demonstrating support for very low-latency and high throughput.

Skip Supplemental Material Section

Supplemental Material

rearchitectingdatacenternetworksandstacksforlowlatencyandhighperformance.webm

webm

117.3 MB

References

  1. M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commodity data center network architecture. In Proc. ACM SIGCOMM, Aug. 2010.Google ScholarGoogle Scholar
  2. M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat. Hedera: Dynamic flow scheduling for data center networks. In Proc. Usenix NSDI, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Alizadeh, T. Edsall, S. Dharmapurikar, R. Vaidyanathan, K. Chu, A. Fingerhut, V. T. Lam, F. Matus, R. Pan, N. Yadav, and G. Varghese. CONGA: Distributed Congestion-aware Load Balancing for Datacenters. In Proc. ACM SIGCOMM 2014, pages 503--514. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan. Data center TCP (DCTCP). In Proc. ACM SIGCOMM, Aug. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Alizadeh, A. Kabbani, T. Edsall, B. Prabhakar, A. Vahdat, and M. Yasuda. Less is more: trading a little bandwidth for ultra-low latency in the data center. In Proc. Usenix NSDI, pages 253--266, 2012.Google ScholarGoogle Scholar
  6. M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker. pFabric: Minimal near-optimal datacenter transport. In Proc. ACM SIGCOMM 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Benson, A. Akella, and D. A. Maltz. Network traffic characteristics of data centers in the wild. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pages 267--280. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Braden. RFC 1644: T/TCP -- TCP extensions for transactions functional specification. Technical report, RFC Editor, July 1994.Google ScholarGoogle Scholar
  9. P. Cheng, F. Ren, R. Shu, and C. Lin. Catch the whole lot in an action: Rapid precise packet loss notification in data centers. In Proc. Usenix NSDI, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Cheng, J. Chu, S. Radhakrishnan, and A. Jain. RFC 7413: TCP fast open. Technical report, RFC Editor, Dec. 2014.Google ScholarGoogle Scholar
  11. J. Chu, N. Dukkipati, Y. Cheng, and M. Mathis. RFC 6928: Increasing TCP's initial window. Technical report, RFC Editor, Apr. 2013.Google ScholarGoogle Scholar
  12. A. Dixit, P. Prakash, Y. Hu, and R. Kompella. On the impact of packet spraying in data center networks. In Proc. IEEE INFOCOM 2013, 2013. Google ScholarGoogle ScholarCross RefCross Ref
  13. DPDK Data Plane Development Kit. http://dpdk.org. Accessed: 2017-01-27.Google ScholarGoogle Scholar
  14. S. Floyd and V. Jacobson. Traffic phase effects in packet-switched gateways. SIGCOMM Comput. Commun. Rev., 21(2):26--42, Apr. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Floyd and J. Kempf. RFC 3714: IAB concerns regarding congestion control for voice traffic in the internet. Technical report, RFC Editor, Mar. 2004.Google ScholarGoogle Scholar
  16. P. X. Gao, A. Narayan, G. Kumar, R. Agarwal, S. Ratnasamy, and S. Shenker. pHost: Distributed Near-optimal Datacenter Transport Over Commodity Network Fabric. In Proc. ACM CoNEXT, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Greenberg el al. VL2: a scalable and flexible data center network. In Proc. ACM SIGCOMM, Aug. 2009.Google ScholarGoogle Scholar
  18. R. Griffith, Y. Chen, J. Liu, A. Joseph, and R. Katz. Understanding TCP incast throughput collapse in datacenter networks. In Proc. WREN Workshop, 2009.Google ScholarGoogle Scholar
  19. C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu. Bcube: A high performance, server-centric network architecture for modular data centers. In Proc. ACM SIGCOMM 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Guo, H. Wu, Z. Deng, G. Soni, J. Ye, J. Padhye, and M. Lipshteyn. Rdma over commodity ethernet at scale. In Proc. ACM SIGCOMM 2016, pages 202--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. He, E. Rozner, K. Agarwal, W. Felter, J. Carter, and A. Akella. Presto: Edge-based load balancing for fast datacenter networks. In Proc. ACM SIGCOMM 2015, pages 465--478. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C.-Y. Hong, M. Caesar, and P. B. Godfrey. Finishing flows quickly with preemptive scheduling. In Proc. ACM SIGCOMM 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. IEEE DCB. 802.3bd - MAC Control Frame for Priority-based Flow Control Project. http://www.ieee802.org/3/bd/, 2010. Superseding IEEE 802.3x Full Duplex and Flow Control.Google ScholarGoogle Scholar
  24. IEEE DCB. 802.1Qbb - Priority-based Flow Control. http://www.ieee802.org/1/pages/802.1bb.html, 2011.Google ScholarGoogle Scholar
  25. Infiniband Trade Association. RoCEv2. https://cw.infinibandta.org/document/dl/7781, Sept. 2014.Google ScholarGoogle Scholar
  26. V. Jacobson and M. J. Karels. Congestion avoidance and control. In Proc. ACM SIGCOMM, Stanford, CA, Aug. 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Kent and J. Mogul. Fragmentation considered harmful. In Proc. ACM SIGCOMM, Aug. 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Mittal, V. T. Lam, N. Dukkipati, E. Blem, H. Wassel, M. Ghobadi, A. Vahdat, Y. Wang, D. Wetherall, and D. Zats. Timely: Rtt-based congestion control for the datacenter. In Proce. ACM SIGCOMM 2015, pages 537--550.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. The P4 Language Consortium. P416 language specification version 1.0.0. 2016.Google ScholarGoogle Scholar
  30. J. Perry, A. Ousterhout, H. Balakrishnan, D. Shah, and H. Fugal. Fastpass: A centralized "zero-queue" datacenter network. In Proc. ACM SIGCOMM 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. Raiciu, S. Barre, C. Pluntke, A. Greenhalgh, D. Wischik, and M. Handley. Improving datacenter performance and robustness with Multipath TCP. In Proc. ACM SIGCOMM, Aug. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. Ramakrishnan, S. Floyd, and D. Black. RFC 3168: the addition of explicit congestion notification (ECN) to IP. Technical report, RFC Editor, Sept. 2001.Google ScholarGoogle Scholar
  33. A. Romanow and S. Floyd. Dynamics of TCP traffic over ATM networks. In Proc. ACM SIGCOMM, London, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren. Inside the social network's (datacenter) network. In Proc. ACM SIGCOMM 2015, pages 123--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Sen, D. Shue, S. Ihm, and M. J. Freedman. Scalable, optimal flow routing in datacenters via local link balancing. In Proc. ACM CoNEXT 2013, pages 151--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. Singla, C.-Y. Hong, L. Popa, and P. B. Godfrey. Jellyfish: Networking data centers randomly. In Proc. Usenix NSDI 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. B. Vamanan, J. Hasan, and T. Vijaykumar. Deadline-aware datacenter tcp (d2tcp). ACM SIGCOMM Computer Communication Review, 42(4):115--126, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. V. Vasudevan, A. Phanishayee, H. Shah, E. Krevat, D. G. Andersen, G. R. Ganger, G. A. Gibson, and B. Mueller. Safe and effective fine-grained tcp retransmissions for datacenter communication. In Proc.ACM SIGCOMM 2009, pages 303--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. C. Wilson, H. Ballani, T. Karagiannis, and A. Rowtron. Better never than late: Meeting deadlines in datacenter networks. In Proc. SIGCOMM '11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Y. Zhu, H. Eran, D. Firestone, C. Guo, M. Lipshteyn, Y. Liron, J. Padhye, S. Raindel, M. H. Yahia, and M. Zhang. Congestion control for large-scale rdma deployments. In Proc. ACM SIGCOMM 2015, pages 523--536. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. N. Zilberman, Y. Audzevich, G. A. Covington, and A. W. Moore. NetFPGA SUME: Toward 100 Gbps as research commodity. Micro, 34(5), 2014.Google ScholarGoogle Scholar

Index Terms

  1. Re-architecting datacenter networks and stacks for low latency and high performance

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGCOMM '17: Proceedings of the Conference of the ACM Special Interest Group on Data Communication
        August 2017
        515 pages
        ISBN:9781450346535
        DOI:10.1145/3098822

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 August 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate554of3,547submissions,16%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader