skip to main content
10.1145/2815400.2815421acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article

Split-level I/O scheduling

Published:04 October 2015Publication History

ABSTRACT

We introduce split-level I/O scheduling, a new framework that splits I/O scheduling logic across handlers at three layers of the storage stack: block, system call, and page cache. We demonstrate that traditional block-level I/O schedulers are unable to meet throughput, latency, and isolation goals. By utilizing the split-level framework, we build a variety of novel schedulers to readily achieve these goals: our Actually Fair Queuing scheduler reduces priority-misallocation by 28x; our Split-Deadline scheduler reduces tail latencies by 4x; our Split-Token scheduler reduces sensitivity to interference by 6x. We show that the framework is general and operates correctly with disparate file systems (ext4 and XFS). Finally, we demonstrate that split-level scheduling serves as a useful foundation for databases (SQLite and PostgreSQL), hypervisors (QEMU), and distributed file systems (HDFS), delivering improved isolation and performance in these important application scenarios.

Skip Supplemental Material Section

Supplemental Material

p474.mp4

mp4

2 GB

References

  1. CFQ (Complete Fairness Queueing). https://www.kernel.org/doc/Documentation/block/cfq-iosched.txt.Google ScholarGoogle Scholar
  2. Database/kernel community topic at collaboration summit 2014. http://www.postgresql.org/message-id/[email protected].Google ScholarGoogle Scholar
  3. Deadline IO scheduler tunables. https://www.kernel.org/doc/Documentation/block/deadline-iosched.txt.Google ScholarGoogle Scholar
  4. Documentation for pgbench. http://http://www.postgresql.org/docs/9.4/static/pgbench.html.Google ScholarGoogle Scholar
  5. Documentation for /proc/sys/vm/*. https://www.kernel.org/doc/Documentation/sysctl/vm.txt.Google ScholarGoogle Scholar
  6. Inside the Windows Vista Kernel: Part 1. http://technet.microsoft.com/en-us/magazine/2007.02.vistakernel.aspx.Google ScholarGoogle Scholar
  7. ionice(1) - Linux man page. http://linux.die.net/man/1/ionice.Google ScholarGoogle Scholar
  8. Notes on the Generic Block Layer Rewrite in Linux 2.5. https://www.kernel.org/doc/Documentation/block/biodoc.txt.Google ScholarGoogle Scholar
  9. pgsql-hackers maillist communication. http://www.postgresql.org/message-id/CA+Tgmobv6gm6SzHx8e2w-0180+jHbCNYbAot9KyzG_3DxRYxaw@mail.gmail.com.Google ScholarGoogle Scholar
  10. Postgresql 9.2.9 documentation. http://www.postgresql.org/docs/9.2/static/wal-configuration.html.Google ScholarGoogle Scholar
  11. Anand, A., Sen, S., Krioukov, A., Popovici, F., Akella, A., Arpaci-Dusseau, A. C., Arpaci-Dusseau, R. H., and Banerjee, S. Avoiding File System Micromanagement with Range Writes. In OSDI '08 (San Diego, CA, December 2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. Information and Control in Gray-Box Systems. In ACM SIGOPS Operating Systems Review (2001), vol. 35, ACM, pp. 43--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Arpaci-Dusseau, R. H., and Arpaci-Dusseau, A. C. Operating Systems: Three Easy Pieces. Arpaci-Dusseau Books, 2014.Google ScholarGoogle Scholar
  14. Banga, G., Druschel, P., and Mogul, J. C. Resource containers: A new facility for resource management in server systems. In OSDI (1999), vol. 99, pp. 45--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Best, S. JFS Overview. http://jfs.sourceforge.net/project/pub/jfs.pdf, 2000.Google ScholarGoogle Scholar
  16. Bonwick, J., and Moore, B. ZFS: The Last Word in File Systems. http://opensolaris.org/os/community/zfs/docs/zfs_last.pdf, 2007.Google ScholarGoogle Scholar
  17. Bosch, P., and Mullender, S. Real-time disk scheduling in a mixed-media file system. In Real-Time Technology and Applications Symposium, 2000. RTAS 2000. Proceedings. Sixth IEEE (2000), pp. 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Craciunas, S. S., Kirsch, C. M., and Röck, H. The TAP Project: Traffic Shaping System Calls. http://tap.cs.uni-salzburg.at/downloads.html.Google ScholarGoogle Scholar
  19. Craciunas, S. S., Kirsch, C. M., and Röck, H. I/O Resource Management Through System Call Scheduling. SIGOPS Oper. Syst. Rev. 42, 5 (July 2008), 44--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Frost, C., Mammarella, M., Kohler, E., de los Reyes, A., Hovsepian, S., Matsuoka, A., and Zhang, L. Generalized File System Dependencies. In SOSP '07 (Stevenson, WA, October 2007), pp. 307--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ganger, G. R., McKusick, M. K., Soules, C. A., and Patt, Y. N. Soft Updates: A Solution to the Metadata Update Problem in File Systems. ACM Transactions on Computer Systems (TOCS) 18, 2 (2000), 127--153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Gu, W., Kalbarczyk, Z., Iyer, R. K., and Yang, Z. Characterization of Linux Kernel Behavior Under Errors. In DSN '03 (San Francisco, CA, June 2003), pp. 459--468.Google ScholarGoogle Scholar
  23. Gulati, A., Merchant, A., and Varman, P. J. pClock: An Arrival Curve Based Approach for QoS Guarantees in Shared Storage Systems. In Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (New York, NY, USA, 2007), SIGMETRICS '07, ACM, pp. 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Gupta, D., Cherkasova, L., Gardner, R., and Vahdat, A. Enforcing performance isolation across virtual machines in xen. In Middleware 2006. Springer, 2006, pp. 342--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hagmann, R. Reimplementing the Cedar File System Using Logging and Group Commit. In SOSP '87 (Austin, TX, November 1987). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hipp, D. R., and Kennedy, D. SQlite, 2007.Google ScholarGoogle Scholar
  27. Hofri, M. Disk scheduling: FCFS vs.SSTF revisited. Communications of the ACM 23, 11 (1980), 645--653. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Huang, L., and Chiueh, T. Implementation of a Rotation-Latency-Sensitive Disk Scheduler. Tech. Rep. ECSL-TR81, SUNY, Stony Brook, March 2000.Google ScholarGoogle Scholar
  29. Jacobson, D. M., and Wilkes, J. Disk Scheduling Algorithms Based on Rotational Position. Tech. Rep. HPL-CSP-91-7, Hewlett Packard Laboratories, 1991.Google ScholarGoogle Scholar
  30. Kim, J., Oh, Y., Kim, E., Choi, J., Lee, D., and Noh, S. H. Disk Schedulers for Solid State Drivers. In Proceedings of the Seventh ACM International Conference on Embedded Software (New York, NY, USA, 2009), EMSOFT '09, ACM, pp. 295--304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Lumb, C., Schindler, J., Ganger, G., Nagle, D., and Riedel, E. Towards Higher Disk Head Utilization: Extracting "Free" Bandwidth From Busy Disk Drives. In OSDI '00 (San Diego, CA, October 2000), pp. 87--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Lumb, C. R., Merchant, A., and Alvarez, G. A. Facade: Virtual storage devices with performance guarantees. In FAST '03 (San Francisco, CA, April 2003). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Mason, C. The Btrfs Filesystem. oss.oracle.com/projects/btrfs/dist/documentation/btrfs-ukuug.pdf, September 2007.Google ScholarGoogle Scholar
  34. Mathur, A., Cao, M., Bhattacharya, S., Dilge, A., Tomas, A., and Vivier, L. The New Ext4 Filesystem: Current Status and Future Plans. In Ottawa Linux Symposium (OLS '07) (Ottawa, Canada, July 2007).Google ScholarGoogle Scholar
  35. Mesnier, M., Chen, F., Luo, T., and Akers, J. B. Differentiated Storage Services. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP '11) (Cascais, Portugal, October 2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Park, S., and Shen, K. FIOS: A fair, Efficient Flash I/O Scheduler. In FAST (2012), p. 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Pillai, T. S., Chidambaram, V., Alagappan, R., Alkiswany, S., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14) (Broomfield, CO, October 2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Popovici, F. I., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. Robust, Portable I/O Scheduling with the Disk Mimic. In USENIX Annual Technical Conference, General Track (2003), pp. 297--310.Google ScholarGoogle Scholar
  39. Povzner, A., Kaldewey, T., Brandt, S., Golding, R., Wong, T. M., and Maltzahn, C. Efficient Guaranteed Disk Request Scheduling with Fahrrad. In EuroSys '08 (Glasgow, Scotland UK, March 2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Riska, A., Larkby-Lahet, J., and Riedel, E. Evaluating Block-level Optimization Through the IO Path. In USENIX '07 (Santa Clara, CA, June 2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Rizzo, L., and Checconi, F. GEOM_SCHED: A Framework for Disk Scheduling within GEOM. http://info.iet.unipi.it/~luigi/papers/20090508-geom_sched-slides.pdf.Google ScholarGoogle Scholar
  42. Rosenblum, M., and Ousterhout, J. The Design and Implementation of a Log-Structured File System. ACM Transactions on Computer Systems 10, 1 (February 1992), 26--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ruemmler, C., and Wilkes, J. An Introduction to Disk Drive Modeling. IEEE Computer 27, 3 (March 1994), 17--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Seltzer, M., Chen, P., and Ousterhout, J. Disk Scheduling Revisited. In USENIX Winter '90 (Washington, DC, January 1990), pp. 313--324.Google ScholarGoogle Scholar
  45. Shenoy, P., and Vin, H. Cello: A Disk Scheduling Framework for Next-generation Operating Systems. In SIGMETRICS '98 (Madison, WI, June 1998), pp. 44--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Shue, D., and Freedman, M. J. From application requests to Virtual IOPs: Provisioned key-value storage with Libra. In Proceedings of the Ninth European Conference on Computer Systems (2014), ACM, p. 17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Sweeney, A., Doucette, D., Hu, W., Anderson, C., Nishimoto, M., and Peck, G. Scalability in the XFS File System. In USENIX 1996 (San Diego, CA, January 1996). Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Thereska, E., Ballani, H., O'Shea, G., Karagiannis, T., Rowstron, A., Talpey, T., Black, R., and Zhu, T. IOFlow: A Software-Defined Storage Architecture. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (2013), ACM, pp. 182--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Van Meter, R., and Gao, M. Latency Management in Storage Systems. In OSDI '00 (San Diego, CA, October 2000), pp. 103--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Wachs, M., Abd-El-Malek, M., Thereska, E., and Ganger, G. R. Argon: Performance insulation for shared storage servers. In FAST '07 (San Jose, CA, February 2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Waldspurger, C. A., and Weihl, W. E. Stride Scheduling: Deterministic Proportional Share Resource Management. Massachusetts Institute of Technology. Laboratory for Computer Science, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Wang, H., and Varman, P. J. Balancing fairness and efficiency in tiered storage systems with bottleneck-aware allocation. In FAST '13 (San Jose, CA, February 2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Wilkes, J., Golding, R., Staelin, C., and Sullivan, T. The HP AutoRAID Hierarchical Storage System. ACM Transactions on Computer Systems 14, 1 (February 1996), 108--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Worthington, B. L., Ganger, G. R., and Patt, Y. N. Scheduling Algorithms for Modern Disk Drives. In SIGMETRICS '94 (Nashville, TN, May 1994), pp. 241--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Yang, J., Sar, C., and Engler, D. EXPLODE: A Lightweight, General System for Finding Serious Storage System Errors. In OSDI '06 (Seattle, WA, November 2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Yang, T., Liu, T., Berger, E. D., Kaplan, S. F., and Moss, J. E. B. Redline: First Class Support for Interactivity in Commodity Operating Systems. In OSDI '08 (San Diego, CA, December 2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Yu, X., Gum, B., Chen, Y., Wang, R. Y., Li, K., Krishnamurthy, A., and Anderson, T. E. Trading Capacity for Performance in a Disk Array. In OSDI '00 (San Diego, CA, October 2000). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Split-level I/O scheduling

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SOSP '15: Proceedings of the 25th Symposium on Operating Systems Principles
              October 2015
              499 pages
              ISBN:9781450338349
              DOI:10.1145/2815400

              Copyright © 2015 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 4 October 2015

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              SOSP '15 Paper Acceptance Rate30of181submissions,17%Overall Acceptance Rate131of716submissions,18%

              Upcoming Conference

              SOSP '24

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader