ABSTRACT
Most operating systems provide protection and isolation to user processes, but not to critical system components such as device drivers or other system code. Consequently, failures in these components often lead to system failures. VirtuOS is an operating system that exploits a new method of decomposition to protect against such failures. VirtuOS exploits virtualization to isolate and protect vertical slices of existing OS kernels in separate service domains. Each service domain represents a partition of an existing kernel, which implements a subset of that kernel's functionality. Unlike competing solutions that merely isolate device drivers, or cannot protect from malicious and vulnerable code, VirtuOS provides full protection of isolated system components. VirtuOS's user library dispatches system calls directly to service domains using an exceptionless system call model, avoiding the cost of a system call trap in many cases.
We have implemented a prototype based on the Linux kernel and Xen hypervisor. We demonstrate the viability of our approach by creating and evaluating a network and a storage service domain. Our prototype can survive the failure of individual service domains while outperforming alternative approaches such as isolated driver domains and even exceeding the performance of native Linux for some multithreaded workloads. Thus, VirtuOS may provide a suitable basis for kernel decomposition while retaining compatibility with existing applications and good performance.
Supplemental Material
- TTCP tool. http://www.netcore.fi/pekkas/linux/ipv6/ttcp.c, 2007.Google Scholar
- Kernel Asynchronous I/O (AIO). http://lse.sourceforge.net/io/aio.html, 2012.Google Scholar
- uClibc C libary. http://uclibc.org/, 2012.Google Scholar
- LMbench -- Tools for Performance Analysis. http://lmbench.sourceforge.net/, 2013.Google Scholar
- SysBench 0.4.12 -- A System Performance Benchmark. http://sysbench.sourceforge.net/, 2013.Google Scholar
- M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young. Mach: A new kernel foundation for UNIX development. In Proceedings of the 1986 Summer USENIX Conference, pages 93--112, 1986.Google Scholar
- T. Ball, E. Bounimova, B. Cook, V. Levin, J. Lichtenberg, C. McGarvey, B. Ondrusek, S. K. Rajamani, and A. Ustuner. Thorough static analysis of device drivers. In Proceedings of the 1st ACM SIGOPS European Conference on Computer Systems, EuroSys'06, pages 73--85, Leuven, Belgium, 2006. Google ScholarDigital Library
- P. Barham, B. Dragovic, K. Fraser, and et al. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles, SOSP'03, pages 164--177, Bolton Landing, NY, USA, 2003. Google ScholarDigital Library
- S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kaashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Corey: an operating system for many cores. In Proceedings of the 8th USENIX Symposium on Operating Systems Design & Implementation, OSDI'08, pages 43--57, San Diego, CA, 2008. Google ScholarDigital Library
- S. Boyd-Wickizer and N. Zeldovich. Tolerating malicious device drivers in Linux. In Proceedings of the 2010 USENIX Annual Technical Conference, ATC'10, pages 117--130, Boston, MA, USA, 2010. Google ScholarDigital Library
- A. Burtsev, K. Srinivasan, P. Radhakrishnan, L. N. Bairavasundaram, K. Voruganti, and G. R. Goodson. Fido: fast inter-virtual-machine communication for enterprise appliances. In Proceedings of the 2009 USENIX Annual Technical Conference, ATC'09, pages 313--326, San Diego, CA, USA, 2009. Google ScholarDigital Library
- T. Bushnell. Towards a new strategy for OS design, 1996. http://www.gnu.org/software/hurd/hurd-paper.html.Google Scholar
- G. Candea, S. Kawamoto, Y. Fujiki, G. Friedman, and A. Fox. Microreboot -- a technique for cheap recovery. In Proceedings of the 6th USENIX Symposium on Operating Systems Design & Implementation, OSDI'04, pages 31--44, San Francisco, CA, USA, 2004. Google ScholarDigital Library
- P. Colp, M. Nanavati, J. Zhu, W. Aiello, G. Coker, T. Deegan, P. Loscocco, and A. Warfield. Breaking up is hard to do: security and functionality in a commodity hypervisor. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles, SOSP'11, pages 189--202, Cascais, Portugal, 2011. Google ScholarDigital Library
- F. M. David, E. M. Chan, J. C. Carlyle, and R. H. Campbell. CuriOS: improving reliability through operating system structure. In Proceedings of the 8th USENIX Symposium on Operating Systems Design & Implementation, OSDI'08, pages 59--72, San Diego, CA, USA, 2008. Google ScholarDigital Library
- U. Drepper and I. Molnar. The native POSIX thread library for Linux, 2005. http://www.akkadia.org/drepper/nptl-design.pdf.Google Scholar
- D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf. Bugs as deviant behavior: a general approach to inferring errors in systems code. In Proceedings of the 18th ACM Symposium on Operating Systems Principles, SOSP'01, pages 57--72, Banff, Alberta, Canada, 2001. Google ScholarDigital Library
- B. Ford, M. Hibler, J. Lepreau, P. Tullmann, G. Back, and S. Clawson. Microkernels meet recursive virtual machines. In Proceedings of the 2th USENIX Symposium on Operating Systems Design & Implementation, OSDI'96, pages 137--151, Seattle, WA, USA, 1996. Google ScholarDigital Library
- K. Fraser, H. Steven, R. Neugebauer, I. Pratt, A. Warfield, and M. Williamson. Safe hardware access with the Xen virtual machine monitor. In Proceedings of the 1st Workshop on Operating System and Architectural Support for the on-demand IT InfraStructure, OASIS'04, 2004.Google Scholar
- A. Ganapathi, V. Ganapathi, and D. Patterson. Windows XP kernel crash analysis. In Proceedings of the 20th Conference on Large Installation System Administration, LISA '06, pages 149--159, Washington, DC, USA, 2006. Google ScholarDigital Library
- V. Ganapathy, M. J. Renzelmann, A. Balakrishnan, M. M. Swift, and S. Jha. The design and implementation of microdrivers. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'13, pages 168--178, Seattle, WA, USA, 2008. Google ScholarDigital Library
- A. Gefflaut, T. Jaeger, Y. Park, J. Liedtke, K. Elphinstone, V. Uhlig, J. E. Tidswell, L. Deller, and L. Reuther. The SawMill multiserver approach. In Proceedings of the 9th ACM SIGOPS European Workshop, pages 109--114, Kolding Denmark, 2000. Google ScholarDigital Library
- K. Glerum, K. Kinshumann, S. Greenberg, G. Aul, V. Orgovan, G. Nichols, D. Grant, G. Loihle, and G. Hunt. Debugging in the (very) large: ten years of implementation and experience. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles, SOSP '09, pages 103--116, Big Sky, MT, USA, 2009. Google ScholarDigital Library
- S. Hand, A. Warfield, K. Fraser, E. Kotsovinos, and D. Magenheimer. Are virtual machine monitors microkernels done right? In Proceedings of the 10th Workshop on Hot Topics in Operating Systems, HOTOS'05, Santa Fe, NM, 2005. Google ScholarDigital Library
- H. Härtig, M. Hohmuth, J. Liedtke, J. Wolter, and S. Schönberg. The performance of μ-kernel-based systems. In Proceedings of the 16th ACM Symposium on Operating Systems Principles, SOSP'97, pages 66--77, Saint Malo, France, 1997. Google ScholarDigital Library
- G. Heiser, V. Uhlig, and J. LeVasseur. Are virtual-machine monitors microkernels done right? SIGOPS Operating Systems Review, 40(1):95--99, Jan. 2006. Google ScholarDigital Library
- J. Helander. Unix under Mach: The Lites Server. Master's thesis, Helsinki University of Technology, 1994.Google Scholar
- J. Herder, D. Moolenbroek, R. Appuswamy, B. Wu, B. Gras, and A. Tanenbaum. Dealing with driver failures in the storage stack. In Proceedings of the 4th Latin-American Symposium on Dependable Computing, LADC'09, pages 119--126, Joao Pessoa, Brazil, 2009. Google ScholarDigital Library
- J. N. Herder, H. Bos, B. Gras, P. Homburg, and A. S. Tanenbaum. The architecture of a fault-resilient operating system. In Proceedings of 12th ASCI Conference, ASCI'06, pages 74--81, Lommel, Belgium, 2006.Google Scholar
- M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008. Google ScholarDigital Library
- A. Kadav, M. J. Renzelmann, and M. M. Swift. Tolerating hardware device failures in software. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles, SOSP'09, pages 59--72, Big Sky, MT, USA, 2009. Google ScholarDigital Library
- A. Kadav, M. J. Renzelmann, and M. M. Swift. Fine-grained fault tolerance using device check-points. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'13, pages 473--484, Houston, Texas, USA, 2013. Google ScholarDigital Library
- A. R. Karlin, K. Li, M. S. Manasse, and S. Owicki. Empirical studies of competitive spinning for a shared-memory multiprocessor. In Proceedings of the 13th ACM Symposium on Operating Systems Principles, SOSP'91, pages 41--55, Pacific Grove, CA, USA, 1991. Google ScholarDigital Library
- B. Leslie, P. Chubb, N. Fitzroy-Dale, S. Gotz, C. Gray, L. Macpherson, D. Potts, Y. R. Shen, K. Elphinstone, and G. Heiser. Userlevel device drivers: Achieved performance. Journal of Computer Science and Technology, 20(5):654--664, Sept. 2005.Google ScholarCross Ref
- J. LeVasseur, V. Uhlig, J. Stoess, and S. Götz. Unmodified device driver reuse and improved system dependability via virtual machines. In Proceedings of the 6th USENIX Symposium on Operating Systems Design & Implementation, OSDI'04, pages 17--30, San Francisco, CA, USA, 2004. Google ScholarDigital Library
- J. Liedtke. Improving IPC by kernel design. In Proceedings of the 14th ACM Symposium on Operating Systems Principles, SOSP'93, pages 175--188, Asheville, NC, USA, 1993. Google ScholarDigital Library
- J. Liedtke. On micro-kernel construction. In Proceedings of the 15th ACM Symposium on Operating Systems Principles, SOSP'95, pages 237--250, Copper Mountain, CO, USA, 1995. Google ScholarDigital Library
- A. Menon, J. R. Santos, Y. Turner, G. J. Janakiraman, and W. Zwaenepoel. Diagnosing performance overheads in the Xen virtual machine environment. In Proceedings of the 1st ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE'05, pages 13--23, Chicago, IL, USA, 2005. Google ScholarDigital Library
- F. Mérillon, L. Réveillère, C. Consel, R. Marlet, and G. Muller. Devil: an IDL for hardware programming. In Proceedings of the 4th USENIX Symposium on Operating Systems Design & Implementation, OSDI'00, pages 17--30, San Diego, CA, USA, 2000. Google ScholarDigital Library
- M. M. Michael and M. L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the 15th Annual ACM Symposium on Principles of Distributed Computing, PODC'96, pages 267--275, Philadelphia, PA, USA, 1996. Google ScholarDigital Library
- B. Murphy. Automating software failure reporting. Queue, 2(8):42--48, Nov. 2004. Google ScholarDigital Library
- R. Nikolaev and G. Back. Perfctr-Xen: a framework for performance counter virtualization. In Proceedings of the 7th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE'11, pages 15--26, Newport Beach, CA, USA, 2011. Google ScholarDigital Library
- H. Raj and K. Schwan. High performance and scalable I/O virtualization via self-virtualized devices. In Proceedings of the 16th International Symposium on High Performance Distributed Computing, HPDC'07, pages 179--188, Monterey, CA, USA, 2007. Google ScholarDigital Library
- M. J. Renzelmann and M. M. Swift. Decaf: moving device drivers to a modern language. In Proceedings of the 2009 USENIX Annual Technical Conference, ATC'09, pages 187--200, San Diego, CA, USA, 2009. Google ScholarDigital Library
- J. S. Robin and C. E. Irvine. Analysis of the Intel Pentium's ability to support a secure virtual machine monitor. In Proceedings of the 9th USENIX Security Symposium, pages 129--144, 2000. Google ScholarDigital Library
- L. Ryzhyk, P. Chubb, I. Kuz, and G. Heiser. Dingo: taming device drivers. In Proceedings of the 4th ACM European Conference on Computer systems, EuroSys'09, pages 275--288, Nuremberg, Germany, 2009. Google ScholarDigital Library
- J. R. Santos, Y. Turner, G. Janakiraman, and I. Pratt. Bridging the gap between software and hardware techniques for I/O virtualization. In Proceedings of the 2008 USENIX Annual Technical Conference, ATC'08, pages 29--42, Boston, Massachusetts, 2008. Google ScholarDigital Library
- L. Soares and M. Stumm. FlexSC: flexible system call scheduling with exception-less system calls. In Proceedings of the 9th USENIX Symposium on Operating Systems Design & Implementation, OSDI'10, pages 1--8, Vancouver, BC, Canada, 2010. Google ScholarDigital Library
- L. Soares and M. Stumm. Exception-less system calls for event-driven servers. In Proceedings of the 2011 USENIX Annual Technical Conference, ATC'11, pages 131--144, Portland, OR, 2011. Google ScholarDigital Library
- J. M. Stevenson and D. P. Julin. Mach-US: UNIX on generic OS object servers. In Proceedings of the USENIX 1995 Technical Conference, TCON'95, pages 119--130, New Orleans, Louisiana, 1995. Google ScholarDigital Library
- M. M. Swift, B. N. Bershad, and H. M. Levy. Improving the reliability of commodity operating systems. In Proceedings of the 19th ACM Symposium on Operating Systems Principles, SOSP'03, pages 207--222, Bolton Landing, NY, USA, 2003. Google ScholarDigital Library
- R. K. Treiber. Systems Programming: Coping with Parallelism. Technical Report RJ 5118, IBM Almaden Research Center, Apr. 1986.Google Scholar
- D. Wentzlaff and A. Agarwal. Factored operating systems (fos): the case for a scalable operating system for multicores. SIGOPS Operating Systems Review, 43(2):76--85, Apr. 2009. Google ScholarDigital Library
- A. Whitaker, M. Shaw, and S. D. Gribble. Scale and performance in the Denali isolation kernel. In Proceedings of the 5th USENIX Symposium on Operating Systems Design & Implementation, OSDI'02, pages 195--209, Boston, MA, USA, 2002. Google ScholarDigital Library
- J. Yang, C. Sar, and D. Engler. Explode: a lightweight, general system for finding serious storage system errors. In Proceedings of the 7th USENIX Symposium on Operating Systems Design & Implementation, OSDI'06, pages 131--146, Seattle, WA, USA, 2006. Google ScholarDigital Library
- A. Zhong, H. Jin, S. Wu, X. Shi, and W. Gen. Optimizing Xen hypervisor by using lock-aware scheduling. In Proceedings of the Second International Conference on Cloud and Green Computing, CGC'2012, pages 31--38, 2012. Google ScholarDigital Library
- F. Zhou, J. Condit, Z. Anderson, I. Bagrak, R. Ennals, M. Harren, G. Necula, and E. Brewer. SafeDrive: safe and recoverable extensions using language-based techniques. In Proceedings of the 7th USENIX Symposium on Operating Systems Design & Implementation, OSDI'06, pages 45--60, Seattle, WA, USA, 2006. Google ScholarDigital Library
Index Terms
- VirtuOS: an operating system with kernel virtualization
Recommendations
My VM is Lighter (and Safer) than your Container
SOSP '17: Proceedings of the 26th Symposium on Operating Systems PrinciplesContainers are in great demand because they are lightweight when compared to virtual machines. On the downside, containers offer weaker isolation than VMs, to the point where people run containers in virtual machines to achieve proper isolation. In this ...
Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors
EuroSys '07: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007Hypervisors, popularized by Xen and VMware, are quickly becoming commodity. They are appropriate for many usage scenarios, but there are scenarios that require system virtualization with high degrees of both isolation and efficiency. Examples include ...
Transparently bridging semantic gap in CPU management for virtualized environments
Consolidated environments are progressively accommodating diverse and unpredictable workloads in conjunction with virtual desktop infrastructure and cloud computing. Unpredictable workloads, however, aggravate the semantic gap between the virtual ...
Comments