skip to main content
research-article
Open Access

Arrakis: The Operating System Is the Control Plane

Published:02 November 2015Publication History
Skip Abstract Section

Abstract

Recent device hardware trends enable a new approach to the design of network server operating systems. In a traditional operating system, the kernel mediates access to device hardware by server applications to enforce process isolation as well as network and disk security. We have designed and implemented a new operating system, Arrakis, that splits the traditional role of the kernel in two. Applications have direct access to virtualized I/O devices, allowing most I/O operations to skip the kernel entirely, while the kernel is re-engineered to provide network and disk protection without kernel mediation of every operation. We describe the hardware and software changes needed to take advantage of this new abstraction, and we illustrate its power by showing improvements of 2 to 5 × in latency and 9 × throughput for a popular persistent NoSQL store relative to a well-tuned Linux implementation.

References

  1. D. Abramson. 2006. Intel virtualization technology for directed I/O. Intel Technology Journal 10, 3 (2006), 179--192.Google ScholarGoogle ScholarCross RefCross Ref
  2. Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In ACM SIGMETRICS 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Gaurav Banga, Peter Druschel, and Jeffrey C. Mogul. 1999. Resource containers: A new facility for resource management in server systems. In Proceedings of the 3rd USENIX Symposium on Operating Systems Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. 2003. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. 2009. The multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Adam Belay, Andrea Bittau, Ali Mashtizadeh, David Terei, David Mazières, and Christos Kozyrakis. 2012. Dune: Safe user-level access to privileged CPU features. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. 2014. IX: A protected dataplane operating system for high throughput and low latency. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Bensoussan, C. T. Clingen, and R. C. Daley. 1972. The multics virtual memory: Concepts and design. Communications of the ACM 15 (1972), 308--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Brian N. Bershad, Stefan Savage, Przemysław Pardyak, Emin Gün Sirer, Marc E. Fiuczynski, David Becker, Craig Chambers, and Susan Eggers. 1995. Extensibility, safety and performance in the SPIN operating system. In Proceedings of the 15th ACM Symposium on Operating Systems Principles. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Richard Black, Paul T. Barham, Austin Donnelly, and Neil Stratford. 1997. Protocol implementation in a vertically structured operating system. In Proceedings of the 22nd Annual Conference on Local Computer Networks. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Adrian M. Caulfield, Todor I. Mollov, Louis Alex Eisner, Arup De, Joel Coburn, and Steven Swanson. 2012. Providing safe, user space access to fast, solid state disks. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Vijay Chidambaram, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2013. Optimistic crash consistency. In Proceedings of the 24th ACM Symposium on Operating Systems Principles. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Compaq Computer Corp., Intel Corporation, and Microsoft Corporation. 1997. Virtual Interface Architecture Specification (version 1.0 ed.).Google ScholarGoogle Scholar
  14. RDMA Consortium. 2009. Architectural Specifications for RDMA over TCP/IP. Retrieved from http://www.rdmaconsortium.org/.Google ScholarGoogle Scholar
  15. Jeffrey Dean and Luiz André Barroso. 2013. The tail at scale. Communications of the ACM 56, 2 (Feb. 2013), 74--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Martin Devera. 2002. HTB Linux queuing discipline manual -- User Guide. Retrieved from http://luxik.cdi.cz/ devik/qos/htb/userg.pdf.Google ScholarGoogle Scholar
  17. Aleksandar Dragojević, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast remote memory. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Peter Druschel, Larry Peterson, and Bruce Davie. 1994. Experiences with a high-speed network adaptor: A software perspective. In Proceedings of the ACM SIGCOMM Conference on Communications Architectures, Protocols and Applications. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Subramanya R. Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. 2014. System software for persistent memory. In Proceedings of the 9th ACM SIGOPS/EuroSys European Conference on Computer Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Fusion-IO. 2014. ioDrive2 and ioDrive2 Duo Multi Level Cell. Fusion-IO. Product Datasheet. Retrieved from http://www.fusionio.com/load/-media-/2rezss/docsLibrary/FIO_DS_ioDrive2.pdf.Google ScholarGoogle Scholar
  21. Gregory R. Ganger, Dawson R. Engler, M. Frans Kaashoek, Hector M. Briceño, Russell Hunt, and Thomas Pinckney. 2002. Fast and flexible application-level networking on Exokernel systems. ACM Transactions on Computer Systems 20, 1 (Feb. 2002), 49--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Abel Gordon, Nadav Amit, Nadav Har'El, Muli Ben-Yehuda, Alex Landau, Assaf Schuster, and Dan Tsafrir. 2012. ELI: Bare-metal performance for I/O virtualization. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Daniel Halperin, Srikanth Kandula, Jitendra Padhye, Paramvir Bahl, and David Whetherall. 2011. Augmenting data center networks with multi-gigabit wireless links. In Proceedings of the ACM SIGCOMM Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sangjin Han, Scott Marshall, Byung-Gon Chun, and Sylvia Ratnasamy. 2012. MegaPipe: A new programming interface for scalable network I/O. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Tyler Harter, Chris Dragga, Michael Vaughn, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2011. A file is not a file: Understanding the I/O behavior of Apple desktop applications. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Bert Hubert. 2009. Linux Advanced Routing & Traffic Control HOWTO. Retrieved from http://www.lartc.org/howto/.Google ScholarGoogle Scholar
  27. Infiniband Trade Organization. 2010. Introduction to Infiniband for End Users. Retrieved from https://cw.infinibandta.org/document/dl/7268.Google ScholarGoogle Scholar
  28. Intel Corporation. 2010. Intel 82599 10 GbE Controller Datasheet. Revision 2.6. Retrieved from http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/82599-10-gbe-controller-datasheet.pdf.Google ScholarGoogle Scholar
  29. Intel Corporation. 2013a. Intel Data Plane Development Kit (Intel DPDK) Programmer's Guide. Intel Corporation. Reference Number: 326003-003.Google ScholarGoogle Scholar
  30. Intel Corporation 2013b. Intel RAID Controllers RS3DC080 and RS3DC040. Intel Corporation. Product Brief. http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/raid-controller-rs3dc-brief.pdf.Google ScholarGoogle Scholar
  31. Intel Corporation. 2013c. Intel Virtualization Technology for Directed I/O Architecture Specification. Technical Report Order Number: D51397-006. Intel Corporation.Google ScholarGoogle Scholar
  32. Intel Corporation. 2013d. NVM Express (revision 1.1a ed.). Intel Corporation. http://www.nvmexpress.org/wp-content/uploads/NVM-Express-1_1a.pdf.Google ScholarGoogle Scholar
  33. EunYoung Jeong, Shinae Woo, Muhammad Jamshed, Haewon Jeongand Sunghwan Ihm, Dongsu Han, and KyoungSoo Park. 2014. mTCP: A highly scalable user-level TCP stack for multicore systems. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. William K. Josephson, Lars A. Bongo, Kai Li, and David Flynn. 2010. DFS: A file system for virtualized flash storage. Transactions on Storage 6, 3, Article 14 (Sept. 2010), 14:1--14:25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Antoine Kaufmann, Simon Peter, Thomas E. Anderson, and Arvind Krishnamurthy. 2015. FlexNIC: Rethinking network DMA. In Proceedings of the 15th Workshop on Hot Topics in Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. P. Kutch. 2011. PCI-SIG SR-IOV primer: An introduction to SR-IOV technology. Intel Application Note 321211--002 (Jan. 2011).Google ScholarGoogle Scholar
  37. I. M. Leslie, D. McAuley, R. Black, T. Roscoe, P. Barham, D. Evers, R. Fairbairns, and E. Hyden. 1996. The design and implementation of an operating system to support distributed multimedia applications. IEEE Journal on Selected Areas in Communications 14, 7 (Sept. 1996), 1280--1297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports, and Steven D. Gribble. 2014. Tales of the Tail: Hardware, OS, and application-level sources of tail latency. In Proceedings of the 5th Symposium on Cloud Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. LSI Corporation 2010. LSISAS2308 PCI Express to 8-Port 6Gb/s SAS/SATA Controller. LSI Corporation. Product Brief. Retrieved from http://www.lsi.com/downloads/Public/SAS%20ICs/LSI_PB_SAS2308.pdf.Google ScholarGoogle Scholar
  40. LSI Corporation 2014. LSISAS3008 PCI Express to 8-Port 12Gb/s SAS/SATA Controller. LSI Corporation. Product Brief. http://www.lsi.com/downloads/Public/SAS%20ICs/LSI_PB_SAS3008.pdf.Google ScholarGoogle Scholar
  41. Ilias Marinos, Robert N. M. Watson, and Mark Handley. 2014. Network stack specialization for performance. In Proceedings of the ACM SIGCOMM Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. David Mosberger and Larry L. Peterson. 1996. Making paths explicit in the Scout operating system. In Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Vivek S. Pai, Peter Druschel, and Willy Zwanepoel. 1999. IO-Lite: A unified I/O buffering and caching system. In Proceedings of the 3rd USENIX Symposium on Operating Systems Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Aleksey Pesterev, Jacob Strauss, Nickolai Zeldovich, and Robert T. Morris. 2012. Improving network connection locality on multicore systems. In Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Sivasankar Radhakrishnan, Yilong Geng, Vimalkumar Jeyakumar, Abdul Kabbani, George Porter, and Amin Vahdat. 2014. SENIC: Scalable NIC for end-host rate limiting. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Barath Raghavan, Kashi Vishwanath, Sriram Ramabhadran, Kenneth Yocum, and Alex C. Snoeren. 2007. Cloud control with distributed rate limiting. In Proceedings of the ACM SIGCOMM Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Luigi Rizzo. 2012. Netmap: A novel framework for fast packet I/O. In Proceedings of the USENIX Annual Technical Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Jim Roskind. 2013. Experimenting with QUIC. Retrieved from http://blog.chromium.org/2013/06/experimenting-with-quic.html.Google ScholarGoogle Scholar
  49. Solarflare Communications, Inc. 2010. .Solarflare SFN5122F Dual-Port 10GbE Enterprise Server Adapter. Retrieved from http://www.solarflare.com/Content/UserFiles/Documents/Solarflare_SFN5122F_10GbE_Adapter_Brief.pdf.Google ScholarGoogle Scholar
  50. Animesh Trivedi, Patrick Stuedi, Bernard Metzler, Roman Pletka, Blake G. Fitch, and Thomas R. Gross. 2013. Unified high-performance I/O: One stack to rule them all. In Proceedings of the 14th Workshop on Hot Topics in Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Haris Volos, Sanketh Nalli, Sankaralingam Panneerselvam, Venkatanathan Varadarajan, Prashant Saxena, and Michael M. Swift. 2014. Aerie: Flexible file-system interfaces to storage-class memory. In Proceedings of the 9th ACM SIGOPS/EuroSys European Conference on Computer Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. T. von Eicken, A. Basu, V. Buch, and W. Vogels. 1995. U-Net: A user-level network interface for parallel and distributed computing. In Proceedings of the 15th ACM Symposium on Operating Systems Principles. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Arrakis: The Operating System Is the Control Plane

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Computer Systems
          ACM Transactions on Computer Systems  Volume 33, Issue 4
          January 2016
          125 pages
          ISSN:0734-2071
          EISSN:1557-7333
          DOI:10.1145/2841315
          Issue’s Table of Contents

          Copyright © 2015 Owner/Author

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 2 November 2015
          • Accepted: 1 August 2015
          • Received: 1 July 2015
          Published in tocs Volume 33, Issue 4

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader