ABSTRACT
Traffic shaping, including pacing and rate limiting, is fundamental to the correct and efficient operation of both datacenter and wide area networks. Sample use cases include policy-based bandwidth allocation to flow aggregates, rate-based congestion control algorithms, and packet pacing to avoid bursty transmissions that can overwhelm router buffers. Driven by the need to scale to millions of flows and to apply complex policies, traffic shaping is moving from network switches into the end hosts, typically implemented in software in the kernel networking stack.
In this paper, we show that the performance overhead of end-host traffic shaping is substantial limits overall system scalability as we move to thousands of individual traffic classes per server. Measurements from production servers show that shaping at hosts consumes considerable CPU and memory, unnecessarily drops packets, suffers from head of line blocking and inaccuracy, and does not provide backpressure up the stack. We present Carousel, a framework that scales to tens of thousands of policies and flows per server, built from the synthesis of three key ideas: i) a single queue shaper using time as the basis for releasing packets, ii) fine-grained, just-in-time freeing of resources in higher layers coupled to actual packet departures, and iii) one shaper per CPU core, with lock-free coordination. Our production experience in serving video traffic at a Cloud service provider shows that Carousel shapes traffic accurately while improving overall machine CPU utilization by 8% (an improvement of 20% in the CPU utilization attributed to networking) relative to state-of-art deployments. It also conforms 10 times more accurately to target rates, and consumes two orders of magnitude less memory than existing approaches.
Supplemental Material
- 2002. pfifo-tc: PFIFO Qdisc. (2002). https://linux.die.net/man/8/tc-pfifo/.Google Scholar
- 2003. Linux Hierarchical Token Bucket. (2003). http://luxik.cdi.cz/devik/qos/htb/.Google Scholar
- 2008. net: Generic receive offload. (2008). https://lwn.net/Articles/358910/.Google Scholar
- 2012. tcp: TCP Small Queues. (2012). https://lwn.net/Articles/506237/.Google Scholar
- 2013. pkt_sched: fq: Fair Queue packet scheduler. (2013). https://lwn.net/Articles/564825/.Google Scholar
- 2013. tcp: TSO packets automatic sizing. (2013). https://lwn.net/Articles/564979/.Google Scholar
- 2016. neper: a Linux networking performance tool. (2016). https://github.com/google/neper.Google Scholar
- 2017. Leaky bucket as a queue. (2017). https://en.wikipedia.org/wiki/Leaky_bucket.Google Scholar
- Mohammad Alizadeh, Abdul Kabbani, Tom Edsall, Balaji Prabhakar, Amin Vahdat, and Masato Yasuda. 2012. Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center. In NSDI '12. 253--266.Google ScholarDigital Library
- W. Almesberger, J. H. Salim, and A. Kuznetsov. 1999. Differentiated services on Linux. In GLOBECOM '99. 831--836 vol. 1b. Google ScholarCross Ref
- Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. 2010. A View of Cloud Computing. Commun. ACM 53, 4 (2010), 50--58. Google ScholarDigital Library
- Wei Bai, Kai Chen, Haitao Wu, Wuwei Lan, and Yangming Zhao. 2014. PAC: taming TCP Incast congestion using proactive ACK control. In ICNP '14. 385--396. Google ScholarDigital Library
- Hitesh Ballani, Paolo Costa, Thomas Karagiannis, and Ant Rowstron. 2011. Towards Predictable Datacenter Networks. SIGCOMM Comput. Commun. Rev. 41, 4 (Aug. 2011), 242--253. Google ScholarDigital Library
- Neda Beheshti, Yashar Ganjali, Monia Ghobadi, Nick McKeown, and Geoff Salmon. 2008. Experimental Study of Router Buffer Sizing. In IMC '08. 197--210. Google ScholarDigital Library
- Jesper Dangaard Brouer. 2015. Network stack challenges at increasing speeds: The 100Gbit/s challenge. LinuxCon North America. (2015). http://events.linuxfoundation.org/sites/events/files/slides/net_stack_challenges_100G_1.pdfGoogle Scholar
- Randy Brown. 1988. Calendar queues: a fast O(1) priority queue implementation for the simulation event set problem. Commun. ACM 31, 10 (1988), 1220--1227. Google ScholarDigital Library
- Neal Cardwell, Yuchung Cheng, C. Stephen Gunn, Van Jacobson, and Soheil Yeganeh. 2016. BBR: Congestion-Based Congestion Control. ACM Queue (2016), 50:20--50:53.Google Scholar
- Yuchung Cheng and Neal Cardwell. 2016. Making Linux TCP Fast. In Netdev Conference.Google Scholar
- David D Clark. 1985. The structuring of systems using upcalls. In SOSP '85. 171--180.Google ScholarDigital Library
- Glenn William Connery, W Paul Sherer, Gary Jaszewski, and James S Binder. 1999. Offload of TCP segmentation to a smart adapter. US Patent 5,937,169. (August 1999).Google Scholar
- Tobias Flach, Pavlos Papageorge, Andreas Terzis, Luis Pedrosa, Yuchung Cheng, Tayeb Karim, Ethan Katz-Bassett, and Ramesh Govindan. 2016. An Internet-Wide Analysis of Traffic Policing. In SIGCOMM '16. 468--482. Google ScholarDigital Library
- Yashar Ganjali and Nick McKeown. 2006. Update on Buffer Sizing in Internet Routers. SIGCOMM Comput. Commun. Rev. (2006), 67--70. Google ScholarDigital Library
- Monia Ghobadi, Yuchung Cheng, Ankur Jain, and Matt Mathis. 2012. Trickle: Rate Limiting YouTube Video Streaming. In USENIX ATC '12. 191--196.Google Scholar
- Sangjin Han, Keon Jang, Aurojit Panda, Shoumik Palkar, Dongsu Han, and Sylvia Ratnasamy. 2015. SoftNIC: A Software NIC to Augment Hardware. Technical Report UCB/EECS-2015-155. EECS Department, University of California, Berkeley.Google Scholar
- Chi-Yao Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, Mohan Nanduri, and Roger Wattenhofer. 2013. Achieving High Utilization with Software-driven WAN. SIGCOMM Comput. Commun. Rev. 43, 4 (2013), 15--26. Google ScholarDigital Library
- Intel Networking Division. 2016. 82599 10 GbE Controller Datasheet. (2016).Google Scholar
- Intel Networking Division. 2016. Ethernet Switch FM10000. (2016).Google Scholar
- Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, Jon Zolla, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2013. B4: Experience with a Globally-deployed Software Defined Wan. In SIGCOMM '13. 3--14.Google ScholarDigital Library
- Keon Jang, Justine Sherry, Hitesh Ballani, and Toby Moncaster. 2015. Silo: Predictable Message Latency in the Cloud. In SIGCOMM '15. 435--448.Google ScholarDigital Library
- Vimalkumar Jeyakumar, Mohammad Alizadeh, David Mazières, Balaji Prabhakar, Albert Greenberg, and Changhoon Kim. 2013. EyeQ: Practical Network Performance Isolation at the Edge. In NSDI '13. 297--311.Google Scholar
- Antoine Kaufmann, SImon Peter, Naveen Kr. Sharma, Thomas Anderson, and Arvind Krishnamurthy. 2016. High Performance Packet Processing with FlexNIC. In ASPLOS '16. 67--81.Google Scholar
- Alok Kumar, Sushant Jain, Uday Naik, Anand Raghuraman, Nikhil Kasinadhuni, Enrique Cauich Zermeno, C. Stephen Gunn, Jing Ai, Björn Carlin, Mihai Amarandei-Stavila, Mathieu Robin, Aspi Siganporia, Stephen Stuart, and Amin Vahdat. 2015. BwE: Flexible, Hierarchical Bandwidth Allocation for WAN Distributed Computing. In SIGCOMM '15. 1--14.Google Scholar
- Gautam Kumar, Srikanth Kandula, Peter Bodik, and Ishai Menache. 2013. Virtualizing Traffic Shapers for Practical Resource Allocation. In 5th USENIX Workshop on Hot Topics in Cloud Computing.Google Scholar
- Adam Langley, Alistair Riddoch, Alyssa Wilk, Antonio Vicente, Charles Krasic, Dan Zhang, Fan Yang, Fedor Kouranov, Ian Swett, Janardhan Iyengar, Jeff Bailey, Jeremy Dorfman, Jim Roskind, Joanna Kulik, Patrik Westin, Raman Tenneti, Robbie Shade, Ryan Hamilton, Victor Vasiliev, Wan-Teh Chang, and Zhongyi Shi. 2017. The QUIC Transport Protocol: Design and Internet-Scale Deployment. In SIGCOMM '17.Google ScholarDigital Library
- Radhika Mittal, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. 2016. Universal Packet Scheduling. In NSDI '16. 501--521.Google Scholar
- Radhika Mittal, Vinh The Lam, Nandita Dukkipati, Emily Blem, Hassan Wassel, Monia Ghobadi, Amin Vahdat, Yaogong Wang, David Wetherall, and David Zats. 2015. TIMELY: RTT-based Congestion Control for the Datacenter. In SIGCOMM '15. 537--550.Google ScholarDigital Library
- Lucian Popa, Praveen Yalagandula, Sujata Banerjee, Jeffrey C. Mogul, Yoshio Turner, and Jose Renato Santos. 2013. ElasticSwitch: Practical Work-conserving Bandwidth Guarantees for Cloud Computing. SIGCOMM Comput. Commun. Rev. 43, 4 (Aug. 2013), 351--362. Google ScholarDigital Library
- Sivasankar Radhakrishnan, Yilong Geng, Vimalkumar Jeyakumar, Abdul Kabbani, George Porter, and Amin Vahdat. 2014. SENIC: Scalable NIC for End-Host Rate Limiting. In NSDI '14.Google Scholar
- Barath Raghavan, Kashi Vishwanath, Sriram Ramabhadran, Kenneth Yocum, and Alex C. Snoeren. 2007. Cloud Control with Distributed Rate Limiting. In SIGCOMM '07. 337--348. Google ScholarDigital Library
- Rusty Russell. 2008. virtio: towards a de-facto standard for virtual I/O devices. ACM SIGOPS Operating Systems Review 42, 5 (2008), 95--103. Google ScholarDigital Library
- M. Shreedhar and George Varghese. 1995. Efficient Fair Queueing Using Deficit Round Robin. In SIGCOMM '95. 375--385.Google Scholar
- Anirudh Sivaraman, Suvinay Subramanian, Mohammad Alizadeh, Sharad Chole, Shang-Tse Chuang, Anurag Agrawal, Hari Balakrishnan, Tom Edsall, Sachin Katti, and Nick McKeown. 2016. Programmable Packet Scheduling at Line Rate. In SIGCOMM '16. 44--57. Google ScholarDigital Library
- G. Varghese and T. Lauck. 1987. Hashed and Hierarchical Timing Wheels: Data Structures for the Efficient Implementation of a Timer Facility. In Proceedings of the Eleventh ACM Symposium on Operating Systems Principles (SOSP '87). 25--38. Google ScholarDigital Library
- L. Zhang. 1990. Virtual Clock: A New Traffic Control Algorithm for Packet Switching Networks. In SIGCOMM '90. 19--29. Google ScholarDigital Library
Index Terms
- Carousel: Scalable Traffic Shaping at End Hosts
Recommendations
ShaperProbe: end-to-end detection of ISP traffic shaping using active methods
IMC '11: Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conferenceWe present an end-to-end measurement method for the detection of traffic shaping. Traffic shaping is typically implemented using token buckets, allowing a maximum burst of traffic to be serviced at the peak capacity of the link, while any remaining ...
Traffic shaping measurements and analyses in 3G network
HP-MOSys '13: Proceedings of the 2nd ACM workshop on High performance mobile opportunistic systemsTraffic shaping effect may have significant impact on end-to-end Quality of Service (QoS) provisioning. Therefore, it should be carefully studied in order to allow the creation of appropriate traffic models to be used for simulations. First, to ...
QoS scheduler/shaper for optical coarse packet switching IP-over-WDM networks
For IP-over-WDM networks, optical coarse packet switching (OCPS) has been proposed to circumvent optical packet switching limitations by using in-band-controlled per-burst switching and advocating traffic control enforcement to achieve high bandwidth ...
Comments