skip to main content
article

The automatic improvement of locality in storage systems

Published:01 November 2005Publication History
Skip Abstract Section

Abstract

Disk I/O is increasingly the performance bottleneck in computer systems despite rapidly increasing disk data transfer rates. In this article, we propose Automatic Locality-Improving Storage (ALIS), an introspective storage system that automatically reorganizes selected disk blocks based on the dynamic reference stream to increase effective storage performance. ALIS is based on the observations that sequential data fetch is far more efficient than random access, that improving seek distances produces only marginal performance improvements, and that the increasingly powerful processors and large memories in storage systems have ample capacity to reorganize the data layout and redirect the accesses so as to take advantage of rapid sequential data transfer. Using trace-driven simulation with a large set of real workloads, we demonstrate that ALIS considerably outperforms prior techniques, improving the average read performance by up to 50% for server workloads and by about 15% for personal computer workloads. We also show that the performance improvement persists as disk technology evolves. Since disk performance in practice is increasing by only about 8% per year, the benefit of ALIS may correspond to as much as several years of technological progress.

References

  1. Akyürek, S. and Salem, K. 1995. Adaptive block rearrangement. ACM Trans. Comput. Syst. 13, 2 (May), 89--121.]] Google ScholarGoogle Scholar
  2. Baker, M. G., Hartman, J. H., Kupfer, M. D., Shirriff, K. W., and Ousterhout, J. K. 1991. Measurements of a distributed file system. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP, Pacific Grove, CA). 198--212.]] Google ScholarGoogle Scholar
  3. Bakke, B. E., Huss, F. L., Moertl, D. F., and Walk, B. M. 1998. Method and apparatus for adaptive localization of frequently accessed, randomly addressed data. U.S. Patent 5765204. Filed June 5, 1996. Issued June 9, 1998.]]Google ScholarGoogle Scholar
  4. Carson, S. D. and Reynolds, Jr. P. F. 1989. Adaptive disk reorganization. Tech. rep. UMIACS-TR-89-4. Department of Computer Science, University of Maryland, College Park, MD.]]Google ScholarGoogle Scholar
  5. Chee, C. L., Lu, H., Tang, H., and Ramamoorthy, C. V. 1998. Adaptive prefetching and storage reorganization in a log-structured storage system. IEEE Trans. Knowl. Data. Eng. 10, 5 (Sept.), 824--838.]] Google ScholarGoogle Scholar
  6. Chen, P. M., Lee, E. K., Gibson, G. A., Katz, R. H., and Patterson, D. A. 1994. RAID: High-performance, reliable secondary storage. ACM Comput. Surv. 26, 2 (June), 145--185.]] Google ScholarGoogle Scholar
  7. Dahlin, M., Mather, C., Wang, R., Anderson, T., and Patterson, D. 1994. A quantitative analysis of cache policies for scalable network file systems. In Proceedings of the ACM Conference on Measurement and Modeling of Computer Systems (SIGMETRICS). 150--160.]] Google ScholarGoogle Scholar
  8. Executive Software International Inc. 2001. Diskeeper 6.0 second edition for Windows. Go online to http://www.execsoft.com/diskeeper/diskeeper.asp.]]Google ScholarGoogle Scholar
  9. Ferrari, D. 1976. Improvement of program behavior. Comput. 9, 11 (Nov.), 39--47.]]Google ScholarGoogle Scholar
  10. Ganger, G. R. and Kaashoek, M. F. 1997. Embedded inodes and explicit grouping: Exploiting disk bandwidth for small files. In Proceedings of the USENIX Technical Conference (Anaheim, CA). 1--17.]] Google ScholarGoogle Scholar
  11. Ganger, G. R., Worthington, B. L., and Patt, Y. N. 1999. The DiskSim Simulation Environment Version 2.0 Reference Manual. Carnegie-Mellon University, Pittsburgh, PA/University of Michigan, Ann Arbor, MI.]]Google ScholarGoogle Scholar
  12. Gray, J. 1998. Put EVERYTHING in the storage device. Talk at NASD Workshop on Storage Embedded Computing. Go online to http://www.nsic.org/nasd/1998-jun/gray.pdf.]]Google ScholarGoogle Scholar
  13. Griffioen, J. and Appleton, R. 1994. Reducing file system latency using a predictive approach. In Proceedings of the Summer USENIX Conference. 197--207.]] Google ScholarGoogle Scholar
  14. Grimsrud, K. S., Archibald, J. K., and Nelson, B. E. 1993. Multiple prefetch adaptive disk caching. IEEE Trans. Knowl. Data Eng. 5, 1 (Feb.), 88--103.]] Google ScholarGoogle Scholar
  15. Grochowski, E. 2002. IBM magnetic hard disk drive technology. Go online to http://www.hgst.com/hdd/technolo/grochows/grocho01.htm.]]Google ScholarGoogle Scholar
  16. Heisch, R. R. 1994. Trace-directed program restructuring for AIX executables. IBM J. Res. Develop. 38, 5 (Sept.), 595--603.]] Google ScholarGoogle Scholar
  17. Hennessy, J. L. and Patterson, D. A. 1996. Computer Architecture A Quantitative Approach, 2nd ed. Morgan Kaufmann, San Francisco, CA.]] Google ScholarGoogle Scholar
  18. Hsu, W. W. 2002. Dynamic locality improvement techniques for increasing effective storage performance. Ph.D. dissertation. University of California, Berkeley, Berkeley, CA. Available as Tech. rep. CSD-03-1223, Computer Science Division, University of California, Berkeley, Berkeley, CA (2003).]] Google ScholarGoogle Scholar
  19. Hsu, W. W. and Smith, A. J. 2003. Characteristics of I/O traffic in personal computer and server workloads. IBM Syst. J. 42, 2, 347--372.]] Google ScholarGoogle Scholar
  20. Hsu, W. W. and Smith, A. J. 2004. The performance impact of I/O optimizations and disk improvements. IBM J. Res. Develop. 48, 2, 255--289. Also available as Chapter 3 of {Hsu 2002}.]] Google ScholarGoogle Scholar
  21. Hsu, W. W., Smith, A. J., and Young, H. C. 2000. Projecting the performance of decision support workloads on systems with smart storage (SmartSTOR). In Proceedings of the IEEE Seventh International Conference on Parallel and Distributed Systems (ICPADS, Iwate, Japan). 417--425.]] Google ScholarGoogle Scholar
  22. Hsu, W. W., Smith, A. J., and Young, H. C. 2001a. Analysis of the characteristics of production database workloads and comparison with the TPC benchmarks. IBM Syst. J. 40, 3, 781--802.]] Google ScholarGoogle Scholar
  23. Hsu, W. W., Smith, A. J., and Young, H. C. 2001b. I/O reference behavior of production database workloads and the TPC benchmarks---an analysis at the logical level. ACM Trans. Database Syst. 26, 1 (Mar.), 96--143.]] Google ScholarGoogle Scholar
  24. Hsu, W. W., Smith, A. J., and Young, H. C. 2003. The automatic improvement of locality in storage systems. Tech. rep. CSD-03-1264. Computer Science Division, University of California, Berkeley, Berkeley, CA. Also available as Chapter 4 of {Hsu 2002}.]]Google ScholarGoogle Scholar
  25. IBM Corp. 2001a. Autonomic computing: IBM's perspective on the state of information technology. Go online to http://www.research.ibm.com/autonomic/manifesto/autonomic_computing.pdf.]]Google ScholarGoogle Scholar
  26. IBM Corp. 2001b. Ultrastar 73LZX product summary version 1.1. IBM, Yorktown Heights, NY.]]Google ScholarGoogle Scholar
  27. Intel Corp. 1998. Intel application launch accelerator. Go online to http://www.intel.com/ial/ala.]]Google ScholarGoogle Scholar
  28. Keeton, K., Patterson, D., and Hellerstein, J. 1998. A case for intelligent disks (IDISKs). ACM SIGMOD Rec. 27, 3, 42--52.]] Google ScholarGoogle Scholar
  29. Knuth, D. E. 1973. The Art of Computer Programming, vol. 3. Addison-Wesley, Reading, MA.]]Google ScholarGoogle Scholar
  30. Kroeger, T. M. and Long, D. D. E. 1996. Predicting file system actions from prior events. In Proceedings of the USENIX Annual Technical Conference. 319--328.]] Google ScholarGoogle Scholar
  31. Lei, H. and Duchamp, D. 1997. An analytical approach to file prefetching. In Proceedings of the USENIX Annual Technical Conference (Anaheim, CA). 275--288.]] Google ScholarGoogle Scholar
  32. Lorch, J. R. and Smith, A. J. 2000. The VTrace tool: Building a system tracer for Windows NT and Windows 2000. MSDN Mag. 15, 10 (Oct.), 86--102.]]Google ScholarGoogle Scholar
  33. Lumb, C., Schindler, Ganger, G. R., Riedel, E., and Nagle, D. F. 2000. Towards higher disk head utilization: Extracting “free” bandwidth from busy disk drives. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI, San Diego, CA). 87--102.]] Google ScholarGoogle Scholar
  34. Masuda, T., Shiota, H., Noguchi, K., and Ohki, T. 1974. Optimization of program organization by cluster analysis. In Proceedings of IFIP Congress. 261--265.]]Google ScholarGoogle Scholar
  35. Matthews, J. N., Roselli, D., Costello, A. M., Wang, R. Y., and Anderson, T. E. 1997. Improving the performance of log-structured file systems with adaptive methods. In Proceedings of the ACM Symposium on Operating System Principles (SOSP, Saint-Malo, France). 238--251.]] Google ScholarGoogle Scholar
  36. McDonald, S. 1988. Dynamically restructuring disk space for improved file system performance. Tech. rep. 88-14. Department of Computational Science, University of Saskatchewan, Saskatoon, Sask. Canada.]]Google ScholarGoogle Scholar
  37. McKusick, M. K., Joy, W. N., Leffler, S. J., and Fabry, R. S. 1984. A fast file system for UNIX. ACM Trans. Comput. Syst. 2, 3 (Aug.), 181--197.]] Google ScholarGoogle Scholar
  38. McNutt, B. 1995. MVS DASD survey: Results and trends. In Proceedings of the Computer Measurement Group (CMG) Conference (Nashville, TN). 658--667.]]Google ScholarGoogle Scholar
  39. McVoy, L. W. and Kleiman, S. R. 1991. Extent-like performance from a UNIX file system. In Proceedings of the Winter USENIX Conference (Dallas, TX). 33--43.]]Google ScholarGoogle Scholar
  40. Mesquite Software Inc. 1994. CSIM18 simulation engine (C++ version). Mesquite Software Inc., Austin, TX.]]Google ScholarGoogle Scholar
  41. Ng, S. W. 1991. Improving disk performance via latency reduction. IEEE Trans. Comput. 40, 1 (Jan.), 22--30.]] Google ScholarGoogle Scholar
  42. Ousterhout, J. and Douglis, F. 1989. Beating the I/O bottleneck: A case for log-structured file systems. Operat. Syst. Rev. 23, 1 (Jan.), 11--28.]] Google ScholarGoogle Scholar
  43. Palmer, M. and Zdonik, S. B. 1991. Fido: A cache that learns to fetch. Tech. rep. CS-91-15. Department of Computer Science, Brown University, Providence, RI.]] Google ScholarGoogle Scholar
  44. Peacock, J. K. 1988. The counterpoint fast file system. In USENIX Conference Proceedings (Dallas, TX). 243--249.]]Google ScholarGoogle Scholar
  45. Riedel, E., Gibson, G. A., and Faloutsos, C. 1998. Active storage for large-scale data mining and multimedia. In Proceedings of the International Conference on Very Large Data Bases (VLDB, New York, NY). 62--73.]] Google ScholarGoogle Scholar
  46. Roselli, D., Lorch, J. R., and Anderson, T. E. 2000. A comparison of file system workloads. In Proceedings of the USENIX Annual Technical Conference (Berkeley, CA). 41--54.]] Google ScholarGoogle Scholar
  47. Ruemmler, C. and Wilkes, J. 1991. Disk shuffling. Tech. rep. HPL-91-156. HP Laboratories. Menlo Park, CA.]]Google ScholarGoogle Scholar
  48. Ruemmler, C. and Wilkes, J. 1993. UNIX disk access patterns. In Proceedings of the USENIX Winter Conference (San Diego, CA). 405--420.]]Google ScholarGoogle Scholar
  49. Smith, A. J. 1978. Sequentiality and prefetching in database systems. ACM Trans. Database Syst. 3, 3 (Sept.), 223--247.]] Google ScholarGoogle Scholar
  50. Smith, A. J. 1985. Disk cache---miss ratio analysis and design considerations. ACM Trans. Comput. Syst. 3, 3 (Aug.), 161--203.]] Google ScholarGoogle Scholar
  51. Smith, A. J. 1994. Trace driven simulation in research on computer architecture and operating systems. In Proceedings of the Conference on New Directions in Simulation for Manufacturing and Communications (Tokyo, Japan). 43--49.]]Google ScholarGoogle Scholar
  52. Staelin, C. and Garcia-Molina, H. 1991. Smart filesystems. In Proceedings of USENIX Winter Conference. 45--52.]]Google ScholarGoogle Scholar
  53. Symantec Corp. 2001. Norton utilities 2002. Go online to http://www.symantec.com/nu/nu_9x.]]Google ScholarGoogle Scholar
  54. Tsangaris, M. M. and Naughton, J. F. 1992. On the performance of object clustering techniques. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 144--153.]] Google ScholarGoogle Scholar
  55. Uhlig, R. A. and Mudge, T. N. 1997. Trace-driven memory simulation: A survey. ACM Comput. Surv. 29, 2 (June), 128--170.]] Google ScholarGoogle Scholar
  56. Verhofstad, J. S. M. 1978. Recovery techniques for database systems. ACM Comput. Surv. 10, 2 (June), 167--195.]] Google ScholarGoogle Scholar
  57. Vogels, W. 1999. File system usage in Windows NT 4.0. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP). 93--109.]] Google ScholarGoogle Scholar
  58. Vongsathorn, P. and Carson, S. D. 1990. A system for adaptive disk rearrangement. Softw.: Pract. Exper. 20, 3 (Mar.), 225--242.]] Google ScholarGoogle Scholar
  59. Whipple II, A. E. 1994. Optimizing a magnetic disk by allocating files by the frequency a file is acessed/updated or by designating a file to a fixed location on a disk. U.S. Patent 5333311. Filed Dec. 10, 1990. Issued July 26, 1994.]]Google ScholarGoogle Scholar
  60. Yu, X., Gum, B., Chen, Y., Wang, W., Li, K., Krishnamurthy, A., and Anderson, A. 2000. Trading capacity for performance in a disk array. In Proceedings of the Symposium on Operating Systems Design and Implementation (OSDI, San Diego, CA). 243--258.]] Google ScholarGoogle Scholar
  61. Zhou, M. and Smith, A. J. 1999. Analysis of personal computer workloads. In Proceedings of the International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunications Systems (MASCOTS, College Park, MD). 208--217.]] Google ScholarGoogle Scholar

Index Terms

  1. The automatic improvement of locality in storage systems

                                Recommendations

                                Reviews

                                Veronica Lagrange

                                Based on the observation that sequential data fetch is more efficient than random access, the authors propose automatic techniques and heuristics to physically reorganize disk blocks. Three heuristics are explained in great detail and evaluated with the help of trace-driven simulations. The heuristics are heat clustering, run clustering, and a combination of both. The heat heuristic locates hot spots on the disk and copies the blocks from those spots to adjacent locations. The run heuristic is based on the observation that some workloads contain long read sequences that are often repeated, these sequences are likewise copied to adjacent locations. All three heuristics are fully analyzed and evaluated for two different storage systems. Two real-life workloads provide input to the simulation: one contains traces from 14 personal computers from different types of users, and the other contains traces from three servers (a file server, a time-sharing system, and a database server). The simulation is divided into two steps: "collect data and apply heuristics," and production. Performance data is collected on the production step and indicates different results for different workloads. Average read response times vary from negative to improvements of up to 50 percent. There is, however, no data on the overhead incurred by the system during the block reorganization step, which is assumed to cost very little and be completed during the system's idle time. Online Computing Reviews Service

                                Jingping Long

                                Modern data is stored in a hierarchical structure: register, level 1 cache, level 2 cache, main memory, and hard disk. A hard disk has the largest storage capacity, but also has the poorest performance in terms of data read and write (I/O). A significant research effort has been made to resolve this performance bottleneck. Part of the research effort is in improving the data locality on the disk. The value of improving data locality is based on this general observation: related data items are intended to operate together. Thus, it can be easily seen that if, when the system transfers the requested data item to the processing unit from the disk, it also loads (prefetches) the related data items to a faster storage system (say, the level 2 cache), then there may be a good chance that in the next command cycle the processing unit will get what it wants from the level 2 cache, rather than from the disk, and the I/O speed will improve. Certainly, prefetching will be more effective if the related data items are stored as close as possible. In other words, by improving data locality, it can be expected that the I/O performance will be improved. To design a scheme for improving data locality, it is important to correctly identify the relation among the data items, if such a relation exists. The authors of this paper point out that, in many cases, related data items are not only accessed in a short time period, but also in a predictable order. Hence, they argue, the related data items should be laid out not only spatially close to each other, but also in the original order. The authors believe this latter point is, to a large extent, more important than the first. This is the paper's main contribution-previous disk data locality researchers have merely tried to store the related data items spatially close to each other and often break the original order. Based on this idea, the authors design a system that may automatically improve the data locality on the disks. The authors test their system using a software simulator driven by some actual disk workload records. Although the authors demonstrate some promising results, their conclusion that the system can improve disk I/O significantly, in general, lacks support. An obvious reason is that, in many cases, there may not be any relations among the data items. Also, it is quite common that the order of accessing related data items is not predictable. The disk workload traces the authors test are all from office computers used by professionals. The results would have been more valuable if the authors had tested some traces recorded in household personal computers, which comprise the majority of the total computers in the world. Online Computing Reviews Service

                                Access critical reviews of Computing literature here

                                Become a reviewer for Computing Reviews.

                                Comments

                                Login options

                                Check if you have access through your login credentials or your institution to get full access on this article.

                                Sign in

                                Full Access

                                PDF Format

                                View or Download as a PDF file.

                                PDF

                                eReader

                                View online with eReader.

                                eReader