Abstract
Recent device hardware trends enable a new approach to the design of network server operating systems. In a traditional operating system, the kernel mediates access to device hardware by server applications to enforce process isolation as well as network and disk security. We have designed and implemented a new operating system, Arrakis, that splits the traditional role of the kernel in two. Applications have direct access to virtualized I/O devices, allowing most I/O operations to skip the kernel entirely, while the kernel is re-engineered to provide network and disk protection without kernel mediation of every operation. We describe the hardware and software changes needed to take advantage of this new abstraction, and we illustrate its power by showing improvements of 2 to 5 × in latency and 9 × throughput for a popular persistent NoSQL store relative to a well-tuned Linux implementation.
- D. Abramson. 2006. Intel virtualization technology for directed I/O. Intel Technology Journal 10, 3 (2006), 179--192.Google ScholarCross Ref
- Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In ACM SIGMETRICS 2012. Google ScholarDigital Library
- Gaurav Banga, Peter Druschel, and Jeffrey C. Mogul. 1999. Resource containers: A new facility for resource management in server systems. In Proceedings of the 3rd USENIX Symposium on Operating Systems Design and Implementation. Google ScholarDigital Library
- Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. 2003. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles. Google ScholarDigital Library
- Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. 2009. The multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles. Google ScholarDigital Library
- Adam Belay, Andrea Bittau, Ali Mashtizadeh, David Terei, David Mazières, and Christos Kozyrakis. 2012. Dune: Safe user-level access to privileged CPU features. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. Google ScholarDigital Library
- Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. 2014. IX: A protected dataplane operating system for high throughput and low latency. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. Google ScholarDigital Library
- A. Bensoussan, C. T. Clingen, and R. C. Daley. 1972. The multics virtual memory: Concepts and design. Communications of the ACM 15 (1972), 308--318. Google ScholarDigital Library
- Brian N. Bershad, Stefan Savage, Przemysław Pardyak, Emin Gün Sirer, Marc E. Fiuczynski, David Becker, Craig Chambers, and Susan Eggers. 1995. Extensibility, safety and performance in the SPIN operating system. In Proceedings of the 15th ACM Symposium on Operating Systems Principles. Google ScholarDigital Library
- Richard Black, Paul T. Barham, Austin Donnelly, and Neil Stratford. 1997. Protocol implementation in a vertically structured operating system. In Proceedings of the 22nd Annual Conference on Local Computer Networks. Google ScholarDigital Library
- Adrian M. Caulfield, Todor I. Mollov, Louis Alex Eisner, Arup De, Joel Coburn, and Steven Swanson. 2012. Providing safe, user space access to fast, solid state disks. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
- Vijay Chidambaram, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2013. Optimistic crash consistency. In Proceedings of the 24th ACM Symposium on Operating Systems Principles. Google ScholarDigital Library
- Compaq Computer Corp., Intel Corporation, and Microsoft Corporation. 1997. Virtual Interface Architecture Specification (version 1.0 ed.).Google Scholar
- RDMA Consortium. 2009. Architectural Specifications for RDMA over TCP/IP. Retrieved from http://www.rdmaconsortium.org/.Google Scholar
- Jeffrey Dean and Luiz André Barroso. 2013. The tail at scale. Communications of the ACM 56, 2 (Feb. 2013), 74--80. Google ScholarDigital Library
- Martin Devera. 2002. HTB Linux queuing discipline manual -- User Guide. Retrieved from http://luxik.cdi.cz/ devik/qos/htb/userg.pdf.Google Scholar
- Aleksandar Dragojević, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast remote memory. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation. Google ScholarDigital Library
- Peter Druschel, Larry Peterson, and Bruce Davie. 1994. Experiences with a high-speed network adaptor: A software perspective. In Proceedings of the ACM SIGCOMM Conference on Communications Architectures, Protocols and Applications. Google ScholarDigital Library
- Subramanya R. Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. 2014. System software for persistent memory. In Proceedings of the 9th ACM SIGOPS/EuroSys European Conference on Computer Systems. Google ScholarDigital Library
- Fusion-IO. 2014. ioDrive2 and ioDrive2 Duo Multi Level Cell. Fusion-IO. Product Datasheet. Retrieved from http://www.fusionio.com/load/-media-/2rezss/docsLibrary/FIO_DS_ioDrive2.pdf.Google Scholar
- Gregory R. Ganger, Dawson R. Engler, M. Frans Kaashoek, Hector M. Briceño, Russell Hunt, and Thomas Pinckney. 2002. Fast and flexible application-level networking on Exokernel systems. ACM Transactions on Computer Systems 20, 1 (Feb. 2002), 49--83. Google ScholarDigital Library
- Abel Gordon, Nadav Amit, Nadav Har'El, Muli Ben-Yehuda, Alex Landau, Assaf Schuster, and Dan Tsafrir. 2012. ELI: Bare-metal performance for I/O virtualization. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
- Daniel Halperin, Srikanth Kandula, Jitendra Padhye, Paramvir Bahl, and David Whetherall. 2011. Augmenting data center networks with multi-gigabit wireless links. In Proceedings of the ACM SIGCOMM Conference. Google ScholarDigital Library
- Sangjin Han, Scott Marshall, Byung-Gon Chun, and Sylvia Ratnasamy. 2012. MegaPipe: A new programming interface for scalable network I/O. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. Google ScholarDigital Library
- Tyler Harter, Chris Dragga, Michael Vaughn, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2011. A file is not a file: Understanding the I/O behavior of Apple desktop applications. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles. Google ScholarDigital Library
- Bert Hubert. 2009. Linux Advanced Routing & Traffic Control HOWTO. Retrieved from http://www.lartc.org/howto/.Google Scholar
- Infiniband Trade Organization. 2010. Introduction to Infiniband for End Users. Retrieved from https://cw.infinibandta.org/document/dl/7268.Google Scholar
- Intel Corporation. 2010. Intel 82599 10 GbE Controller Datasheet. Revision 2.6. Retrieved from http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/82599-10-gbe-controller-datasheet.pdf.Google Scholar
- Intel Corporation. 2013a. Intel Data Plane Development Kit (Intel DPDK) Programmer's Guide. Intel Corporation. Reference Number: 326003-003.Google Scholar
- Intel Corporation 2013b. Intel RAID Controllers RS3DC080 and RS3DC040. Intel Corporation. Product Brief. http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/raid-controller-rs3dc-brief.pdf.Google Scholar
- Intel Corporation. 2013c. Intel Virtualization Technology for Directed I/O Architecture Specification. Technical Report Order Number: D51397-006. Intel Corporation.Google Scholar
- Intel Corporation. 2013d. NVM Express (revision 1.1a ed.). Intel Corporation. http://www.nvmexpress.org/wp-content/uploads/NVM-Express-1_1a.pdf.Google Scholar
- EunYoung Jeong, Shinae Woo, Muhammad Jamshed, Haewon Jeongand Sunghwan Ihm, Dongsu Han, and KyoungSoo Park. 2014. mTCP: A highly scalable user-level TCP stack for multicore systems. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation. Google ScholarDigital Library
- William K. Josephson, Lars A. Bongo, Kai Li, and David Flynn. 2010. DFS: A file system for virtualized flash storage. Transactions on Storage 6, 3, Article 14 (Sept. 2010), 14:1--14:25 pages. Google ScholarDigital Library
- Antoine Kaufmann, Simon Peter, Thomas E. Anderson, and Arvind Krishnamurthy. 2015. FlexNIC: Rethinking network DMA. In Proceedings of the 15th Workshop on Hot Topics in Operating Systems. Google ScholarDigital Library
- P. Kutch. 2011. PCI-SIG SR-IOV primer: An introduction to SR-IOV technology. Intel Application Note 321211--002 (Jan. 2011).Google Scholar
- I. M. Leslie, D. McAuley, R. Black, T. Roscoe, P. Barham, D. Evers, R. Fairbairns, and E. Hyden. 1996. The design and implementation of an operating system to support distributed multimedia applications. IEEE Journal on Selected Areas in Communications 14, 7 (Sept. 1996), 1280--1297. Google ScholarDigital Library
- Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports, and Steven D. Gribble. 2014. Tales of the Tail: Hardware, OS, and application-level sources of tail latency. In Proceedings of the 5th Symposium on Cloud Computing. Google ScholarDigital Library
- LSI Corporation 2010. LSISAS2308 PCI Express to 8-Port 6Gb/s SAS/SATA Controller. LSI Corporation. Product Brief. Retrieved from http://www.lsi.com/downloads/Public/SAS%20ICs/LSI_PB_SAS2308.pdf.Google Scholar
- LSI Corporation 2014. LSISAS3008 PCI Express to 8-Port 12Gb/s SAS/SATA Controller. LSI Corporation. Product Brief. http://www.lsi.com/downloads/Public/SAS%20ICs/LSI_PB_SAS3008.pdf.Google Scholar
- Ilias Marinos, Robert N. M. Watson, and Mark Handley. 2014. Network stack specialization for performance. In Proceedings of the ACM SIGCOMM Conference. Google ScholarDigital Library
- David Mosberger and Larry L. Peterson. 1996. Making paths explicit in the Scout operating system. In Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation. Google ScholarDigital Library
- Vivek S. Pai, Peter Druschel, and Willy Zwanepoel. 1999. IO-Lite: A unified I/O buffering and caching system. In Proceedings of the 3rd USENIX Symposium on Operating Systems Design and Implementation. Google ScholarDigital Library
- Aleksey Pesterev, Jacob Strauss, Nickolai Zeldovich, and Robert T. Morris. 2012. Improving network connection locality on multicore systems. In Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems. Google ScholarDigital Library
- Sivasankar Radhakrishnan, Yilong Geng, Vimalkumar Jeyakumar, Abdul Kabbani, George Porter, and Amin Vahdat. 2014. SENIC: Scalable NIC for end-host rate limiting. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation. Google ScholarDigital Library
- Barath Raghavan, Kashi Vishwanath, Sriram Ramabhadran, Kenneth Yocum, and Alex C. Snoeren. 2007. Cloud control with distributed rate limiting. In Proceedings of the ACM SIGCOMM Conference. Google ScholarDigital Library
- Luigi Rizzo. 2012. Netmap: A novel framework for fast packet I/O. In Proceedings of the USENIX Annual Technical Conference. Google ScholarDigital Library
- Jim Roskind. 2013. Experimenting with QUIC. Retrieved from http://blog.chromium.org/2013/06/experimenting-with-quic.html.Google Scholar
- Solarflare Communications, Inc. 2010. .Solarflare SFN5122F Dual-Port 10GbE Enterprise Server Adapter. Retrieved from http://www.solarflare.com/Content/UserFiles/Documents/Solarflare_SFN5122F_10GbE_Adapter_Brief.pdf.Google Scholar
- Animesh Trivedi, Patrick Stuedi, Bernard Metzler, Roman Pletka, Blake G. Fitch, and Thomas R. Gross. 2013. Unified high-performance I/O: One stack to rule them all. In Proceedings of the 14th Workshop on Hot Topics in Operating Systems. Google ScholarDigital Library
- Haris Volos, Sanketh Nalli, Sankaralingam Panneerselvam, Venkatanathan Varadarajan, Prashant Saxena, and Michael M. Swift. 2014. Aerie: Flexible file-system interfaces to storage-class memory. In Proceedings of the 9th ACM SIGOPS/EuroSys European Conference on Computer Systems. Google ScholarDigital Library
- T. von Eicken, A. Basu, V. Buch, and W. Vogels. 1995. U-Net: A user-level network interface for parallel and distributed computing. In Proceedings of the 15th ACM Symposium on Operating Systems Principles. Google ScholarDigital Library
Index Terms
- Arrakis: The Operating System Is the Control Plane
Recommendations
ELI: bare-metal performance for I/O virtualization
ASPLOS XVII: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating SystemsDirect device assignment enhances the performance of guest virtual machines by allowing them to communicate with I/O devices without host involvement. But even with device assignment, guests are still unable to approach bare-metal performance, because ...
ELI: bare-metal performance for I/O virtualization
ASPLOS '12Direct device assignment enhances the performance of guest virtual machines by allowing them to communicate with I/O devices without host involvement. But even with device assignment, guests are still unable to approach bare-metal performance, because ...
ELI: bare-metal performance for I/O virtualization
ASPLOS '12Direct device assignment enhances the performance of guest virtual machines by allowing them to communicate with I/O devices without host involvement. But even with device assignment, guests are still unable to approach bare-metal performance, because ...
Comments