skip to main content
article

I/O reference behavior of production database workloads and the TPC benchmarks—an analysis at the logical level

Authors Info & Claims
Published:01 March 2001Publication History
Skip Abstract Section

Abstract

As improvements in processor performance continue to far outpace improvements in storage performance, I/O is increasingly the bottleneck in computer systems, especially in large database systems that manage huge amoungs of data. The key to achieving good I/O performance is to thoroughly understand its characteristics. In this article we present a comprehensive analysis of the logical I/O reference behavior of the peak productiondatabase workloads from ten of the world's largest corporations. In particular, we focus on how these workloads respond to different techniques for caching, prefetching, and write buffering. Our findings include several broadly applicable rules of thumb that describe how effective the various I/O optimization techniques are for the production workloads. For instance, our results indicate that the buffer pool miss ratio tends to be related to the ratio of buffer pool size to data size by an inverse square root rule. A similar fourth root rule relates the write miss ratio and the ration of buffer pool size to data size.

In addition, we characterize the reference characteristics of workloads similar to the Transaction Processing Performance Council (TPC) benchmarks C (TPC-C) and D(TPC-D), which are de facto standard performance measures for online transaction processing (OLTP) systems and decision support systems (DSS), respectively. Since benchmarks such as TPC-C and TPC-D can only be used effectively if their strengths and limitations are understood, a major focus of our analysis is to identify aspects of the benchmarks that stress the system differently than the production workloads. We discover that for the most part, the reference behavior of TPC-C and TPC-D fall within the range of behavior exhibited by the production workloads. However, there are some noteworthy exceptions that affect well-known I/O optimization techniques such as caching (LRU is further from the optimal for TPC-C, while there is little sharing of pages between transactions for TPC-D), prefetching (TPC-C exhibits no significant sequentiality), and write buffering (write buffering is lees effective for the TPC benchmarks). While the two TPC benchmarks generally complement one another in reflecting the characteristics of the production workloads, there remain aspects of the real workloads that are not represented by either of the benchmarks.

References

  1. AHO,A.V.,DENNING,P.J.,AND ULLMAN, J. D. 1971. Principles of optimal page replacement. J. ACM 18, 1 (Jan.), 80-93.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ATUL, C., DONALD,H.J.,SHIBAMIYA, A., LYLE,R.W.,AND WATTS, S. J. 1988. System and method for avoiding complete index traversals in sequential and almost sequential index probes. U.S. Patent 5748952. Filed May 10, 1995. Issued May 5, 1998.]]Google ScholarGoogle Scholar
  3. BACH, M. J. 1986. The Design of the UNIX Operating System. Prentice Hall.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. BAKER,M.G.,HARTMAN, J. H., KUPFER,M.D.,SHIRRIFF,K.W.,AND OUSTERHOUT, J. K. 1991. Measurements of a distributed file system. In Proceedings of the ACM Symposium on Operating Systems Principles (Pacific Grove, CA, Oct. 1991), 198-212.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. BELADY, L. A. 1966. A study of replacement algorithms for a virtual-storage computer. IBM Syst. J. 5, 2, 78-101.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. CARSON,S.D.AND SETIA, S. 1992. Analysis of the periodic update write policy for disk cache. IEEE Trans. Softw. Eng. 18, 1 (Jan.), 44-54.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. CASAS,I.R.AND SEVCIK, K. C. 1989. A buffer management model for use in predicting overall database system performance. In Proceedings of the IEEE International Conference on Data Engineering (Los Angeles, CA, Feb. 1989), 463-469.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. CHEN,C.M.AND ROUSSOPOULOS, N. 1993. Adaptive database buffer allocation using query feedback. In Proceedings of the International Conference on Very Large Data Bases ( VLDB) (Dublin, Ireland, Aug. 1993), 342-353.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. CHEN, P. M., LEE, E. K., GIBSON, G. A., KATZ,R.H.,AND PATTERSON, D. A. 1994. RAID: highperformance, reliable secondary storage. ACM Comput. Surv. 26, 2 (June), 145-185.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. CHOU,H.T.AND DEWITT, D. J. 1985. An evaluation of buffer management strategies for relational database systems. In Proceedings of the International Conference on Very Large Data Bases ( VLDB) (Stockholm, Sweden, Aug. 1985), 127-141.]]Google ScholarGoogle Scholar
  11. CODD, E. F. 1970. A relational model for large shared data banks. Commun. ACM 13, 6 (June), 377-387.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. COFFMAN,E.G.,JR.AND DENNING, P. J. 1973. Operating Systems Theory. Prentice-Hall, Inc.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. CORNELL,D.W.AND YU, P. S. 1989. Integration of buffer management and query optimization in relational database environment. In Proceedings of the International Conference on Very Large Data Bases ( VLDB) (Amsterdam, The Netherlands, Aug. 1989), 247-255.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. DAN,A.AND TOWSLEY, D. 1990. An approximate analysis of the lru and fifo buffer replacement schemes. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (Boulder, CO, May 1990), 143-52.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. DAN, A., YU,P.S.,AND CHUNG, J.-Y. 1993. Database access characterization for buffer hit prediction. In Proceedings of the IEEE International Conference on Data Engineering (Los Alamitos, CA, April 1993), 134-144.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. DOPPELHAMMER, J., HOPPLER, T., KEMPER, A., AND KOSSMANN, D. 1997. Database performance in the real world-TPC-D and SAP R/3. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Tucson, AZ, May 1997), 123-134.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. EFFELSBERG,W.AND LOOMIS, M. E. S. 1984. Logical, internal, and physical reference behavior in CODASYL database systems. ACM Trans. Database Syst. 9, 2 (June), 187-213.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. FALOUTSOS, C., NG, R., AND SELLIS, T. 1991. Predictive load control for flexible buffer allocation. In Proceedings of the International Conference on Very Large Data Bases ( VLDB) (Barcelona, Spain, Sept. 1991), 265-274.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. FLOYD,R.A.AND ELLIS, C. S. 1989. Directory reference patterns in hierarchical file systems. IEEE Trans. Knowl. Data Eng. 1, 2 (June), 238-247.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. GRAEFE, G. 1993. Query evaluation techniques for large databases. ACM Comput. Surv. 25,2 (June), 73-170.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. HAAS, L., CHANG, W., LOHMAN, G., MCPHERSON, M., WILMS, P., LAPIS, G., LINDSAY, B., PIRAHESH, H., CAREY, M., AND SHEKITA, E. 1990. Starburst mid-flight: As the dust clears. IEEE Trans. Knowl. Data Eng. 2, 1 (March), 143-160.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. HAWTHORN,P.AND STONEBRAKER, M. 1979. Performance analysis of a relational data base management system. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Boston, MA, May 1979), 1-12.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. HENNESSY,J.L.AND PATTERSON, D. A. 1996. Computer Architecture A Quantitative Approach. 2nd ed. Morgan Kaufmann, San Francisco, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. HILL, A. V. 1913. The combinations of haemoglobin with oxygen and carbon monoxide. Biochemistry J. 7, 471-480.]]Google ScholarGoogle ScholarCross RefCross Ref
  25. HSU,W.W.,SMITH,A.J.,AND YOUNG, H. C. 1999a. I /O reference behavior of production database workloads and the TPC benchmarks-an analysis at the logical level. Tech. Rep. CSD-99-1071, Computer Science Div., Univ. of California, Berkeley, Nov. 1999. Also available as Res. Rep. RJ 10166, IBM Almaden Research Center, San Jose, CA, Nov. 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. HSU,W.W.,SMITH,A.J.,AND YOUNG, H. C. 1999b. Results and data for 'Analysis of the I /O characteristics of production database workloads and the TPC benchmarks'. http://www. cs.berkeley.edu /~windsorh/DBChar.]]Google ScholarGoogle Scholar
  27. HSU,W.W.,SMITH,A.J.,AND YOUNG, H. C. 2000. Projecting the performance of decision support workloads on systems with smart storage (SmartSTOR). In Proceedings of the 7th IEEE International Conference on Parallel and Distributed Systems (Iwate, Japan, July 2000), 417-425. Also available as Res. Rep. RJ 10145, IBM Almaden Research Center, San Jose, CA, July 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. HSU,W.W.,SMITH,A.J.,AND YOUNG, H. C. 2001. Analysis of the characteristics of production database workloads and comparison with the TPC benchmarks. IBM Syst. J. 40, 3. To appear. Also available as Tech. Rep. CSD-99-1070, Computer Science Div., Univ. of California, Berkeley, Nov. 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. IBM CORP. 1997a. DB2 for OS/390 V5 Installation Guide.]]Google ScholarGoogle Scholar
  30. IBM CORP. 1997b. DB2 UDB V5 Administration Guide.]]Google ScholarGoogle Scholar
  31. INTEL CORP. 1999. Intel extended server memory architecture (ESMA): Overcoming the 4 GB memory barrier. http://www.intel.com/procs/servers/pentiumiii/xeon/whitepapers/ESMA. htm.]]Google ScholarGoogle Scholar
  32. JOHNSON,T.AND SHASHA, D. 1994. 2Q: A low overhead high performance buffer management replacement algorithm. In Proceedings of the International Conference on Very Large Data Bases ( VLDB) (Santiago, Chile, 1994), 439-450.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. KEARNS,J.P.AND DEFAZIO, S. 1983. Locality of reference in hierarchical database systems. IEEE Trans. Softw. Eng. 19, 2 (March), 128-134.]]Google ScholarGoogle Scholar
  34. KEARNS,J.P.AND DEFAZIO, S. 1989. Diversity in database reference behavior. Perform. Eval. Rev. 17, 1 (May), 11-19.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. KING, W. F. 1971. Analysis of paging algorithms. In Proceedings of the IFIP Congress (Ljubljana, Yugoslavia, Aug. 1971), 485-490.]]Google ScholarGoogle Scholar
  36. KNUTH, D. E. 1998. The Art of Computer Programming: Sorting and Searching. 2nd ed., Vol. 3. Addison-Wesley.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. LEUTENEGGER,S.T.AND DIAS, D. M. 1993. A modeling study of the TPC-C benchmark. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Washington, D.C., May 1993), 22-31.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. MCNUTT, B. 1991. A simple statistical model of cache reference locality, and its application to cache planning, measurement and control. In Proceedings of the CMG (Computer Measurement Group) Conference (Nashville, TN, Dec. 1991), 203-210.]]Google ScholarGoogle Scholar
  39. MCNUTT, B. 1995. MVS DASD survey: Results and trends. In Proceedings of the CMG (Computer Measurement Group) Conference (Nashville, TN, Dec. 1995), 658-667.]]Google ScholarGoogle Scholar
  40. MENON, J. 1995. A performance comparison of RAID-5 and log-structured arrays. In Proceedings of the IEEE International Symposium on High Performance Distributed Computing (Washington, DC, Aug. 1995), 167-178.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. MOGUL, J. C. 1994. A better update policy. In Proceedings of the Summer 1994 USENIX Conference (Boston, MA, June 1994), 99-111.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. MOHAN, C., HADERLE, D., LINDSAY, B., PIRAHESH, H., AND SCHWARZ, P. 1992. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst. 17, 1 (March), 94-162.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. NELSON,M.N.,WELCH,B.B.,AND OUSTERHOUT, J. K. 1988. Caching in the SPRITE network file system. ACM Trans. Comput. Syst. 6, 1 (Feb.), 134-154.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. NG,R.T.,FALOUTSOS,C.,AND SELLIS, T. K. 1991. Flexible buffer allocation based on marginal gains. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Denver, CO, May 1991), 387-396.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. NICOLA,V.F.,DAN, A., AND DIAS, D. M. 1992. Analysis of the generalized clock buffer replacement scheme for database transaction processing. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (Newport, RI, June 1992), 35-46.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. O'NEIL, E. J., O'NEIL,P.E.,AND WEIKUM, G. 1993. The LRU-K page replacement algorithm for database disk buffering. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Washington, D.C., May 1993), 297-306.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. OUSTERHOUT, J., DA COSTA, H., HARRISON, D., KUNZE, J., KUPFER, M., AND THOMPSON, J. 1985. A trace-driven analysis of the UNIX 4.2 BSD file system. In Proceedings of the ACM Symposium on Operating System Principles (Orcas Island, WA, Dec. 1985), 15-24.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. PRESS, W. H., FLANNERY,B.P.,TEUKOLSKY,S.A.,AND VETTERLING, W. T. 1990. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. RAGAZ,N.AND RODRIGUEZ-ROSELL, J. 1976. Empirical studies of storage management in a data base system. Res. Rep. RJ 1834, IBM Research Laboratory, San Jose, CA, Oct. 1976.]]Google ScholarGoogle Scholar
  50. RAMAKRISHNAN, K. K., BISWAS,P.,AND KAREDLA, R. 1992. Analysis of file I/O traces in commercial computing environments. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (Newport, RI, June 1992), 78-90.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. ROBERTSON,J.AND DEVARAKONDA, M. 1990. Data cache management using frequency-based replacement. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (Boulder, CO, May 1990), 134-142.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. RODRIGUEZ-ROSELL, J. 1976. Empirical data reference behavior in data base systems. IEEE Computer 9, 11 (Nov.), 9-13.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. ROSENBLUM,M.AND OUSTERHOUT, J. K. 1991. The design and implementation of a log-structured file system. In Proceedings of the ACM Symposium on Operating Systems Principles (Pacific Grove, CA, Oct. 1991), 1-15.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. SACCO,G.M.AND SCHKOLNICK, M. 1982. A mechanism for managing the buffer pool in a relational database system using the hot set model. In Proceedings of the International Conference on Very Large Data Bases ( VLDB) (Mexico City, Mexico, Sept. 1982), 257-262.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. SACCO,G.M.AND SCHKOLNICK, M. 1986. Buffer management in relational database systems. ACM Trans. Database Syst. 11, 4 (Dec.), 473-498.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. SELINGER,P.G.,ASTRAHAN, M. M., CHAMBERLIN,D.D.,LORIE,R.A.,AND PRICE, T. G. 1979. Access path selection in a relational database management system. In Proceedings of the ACMSIGMOD International Conference on Management of Data (Boston, MA, May 1979), 23-34.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. SINGHAL,V.AND SMITH, A. J. 1997. Analysis of locking behavior in three real database systems. VLDB J. 6, 1 (Jan.), 40-52. Extended version available as Tech. Rep. CSD-94-801, Computer Science Div., Univ. of California, Berkeley, CA, Apr. 1994.]]Google ScholarGoogle Scholar
  58. SMITH, A. J. 1976. Analysis of the optimal, look-ahead demand paging algorithms. SIAM J. Comput. 5, 4 (Dec.), 743-757.]]Google ScholarGoogle ScholarCross RefCross Ref
  59. SMITH, A. J. 1978. Sequentiality and prefetching in database systems. ACM Trans. Database Syst. 3, 3 (Sept.), 223-247.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. SMITH, A. J. 1985. Disk cache-miss ratio analysis and design considerations. ACM Trans. Comput. Syst. 3, 3 (Aug.), 161-203.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. SMITH, A. J. 1994. Trace driven simulation in research on computer architecture and operating systems. In Proceedings of the Conference on New Directions in Simulation for Manufacturing and Communications (Tokyo, Japan, Aug. 1994), 43-49.]]Google ScholarGoogle Scholar
  62. STONEBRAKER, M. 1981. Operating system support for database management. Commun. ACM 24, 7 (July), 412-418.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. TENG,J.Z.AND GUMAER, R. A. 1984. Managing IBM Database 2 buffers to maximize performance. IBM Syst. J. 23, 2, 211-218.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. THOMPSON, J. G. 1987. Efficient analysis of caching systems. Ph. D. thesis, Univ. of California, Berkeley. Available as Tech. Rep. CSD 87/374, Computer Science Div., Univ. of California, Berkeley, CA, Oct. 1987.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. TPC. 1997a. TPC Benchmark TM C Standard Specification Revision 3.3.2. Transaction Processing Performance Council.]]Google ScholarGoogle Scholar
  66. TPC. 1997b. TPC Benchmark TM D Standard Specification Revision 1.3.1. Transaction Processing Performance Council.]]Google ScholarGoogle Scholar
  67. TPC. 1999a. TPC Benchmark TM H Standard Specification Revision 1.1.0. Transaction Processing Performance Council.]]Google ScholarGoogle Scholar
  68. TPC. 1999b. TPC Benchmark TM R Standard Specification Revision 1.0.1. Transaction Processing Performance Council.]]Google ScholarGoogle Scholar
  69. TSUEI, T.-F., PACKER,A.N.,AND KO, K.-T. 1997. Database buffer size investigation for OLTP workloads. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Tucson, AZ, May 1997), 112-122.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. TUEL,JR., W. G. 1976. An analysis of buffer paging in virtual storage systems. IBM J. Res. Dev. 20, 5 (Sept.), 518-520.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. TUEL,JR., W. G. AND RODRIGUEZ-ROSELL, J. 1975. A methodology for the evaluation of data base systems. Res. Rep. RJ 1668, IBM Research Laboratory, San Jose, CA, Oct. 1975.]]Google ScholarGoogle Scholar
  72. UHLIG,R.A.AND MUDGE, T. N. 1997. Trace-driven memory simulation: A survey. ACM Comput. Surv. 29, 2 (June), 128-170.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. VERKAMO, A. I. 1985. Empirical results on locality in database referencing. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (Austin, TX, Aug. 1985), 49-58.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. VISHLITZKY,N.AND OFEK, Y. 1988. Sequential cache management system utilizing the establishment of a microcache and managing the contents of such according to a threshold comparison. U.S. Patent 5706467. Filed Sep 5, 1995. Issued Jan 6, 1998.]]Google ScholarGoogle Scholar
  75. WELCH, B. B. 1991. Measured performance of caching in the Sprite network file system. Comput. Syst. 4, 3 (Summer), 315-342.]]Google ScholarGoogle Scholar
  76. WILKES, J., GOLDING, R. A., STAELIN,C.,AND SULLIVAN, T. 1996. The HP AutoRAID hierarchical storage system. ACM Trans. Comput. Syst. 14, 1 (Feb.), 108-136.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. YU,P.S.AND CORNELL, D. W. 1993. Buffer management based on return on consumption in a multi-query environment. VLDB J. 2, 1 (Jan.), 1-38.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. ZHOU, S., DA COSTA, H., AND SMITH, A. J. 1985. A file system tracing package for Berkeley UNIX.In Proceedings of the 10th Usenix Conference (Portland, OR, June 1985), 407-419.]]Google ScholarGoogle Scholar
  79. ZIVKOV,B.T.AND SMITH, A. J. 1997. Disk cache design and performance as evaluated in large timesharing and database systems. In Proceedings of the CMG (Computer Measurement Group) Conference (Orlando, FL, Dec. 1997), 639-658.]]Google ScholarGoogle Scholar

Index Terms

  1. I/O reference behavior of production database workloads and the TPC benchmarks—an analysis at the logical level

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in

                  Full Access

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader