skip to main content
10.1145/3035918.3064029acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Automatic Database Management System Tuning Through Large-scale Machine Learning

Published:09 May 2017Publication History

ABSTRACT

Database management system (DBMS) configuration tuning is an essential aspect of any data-intensive application effort. But this is historically a difficult task because DBMSs have hundreds of configuration "knobs" that control everything in the system, such as the amount of memory to use for caches and how often data is written to storage. The problem with these knobs is that they are not standardized (i.e., two DBMSs use a different name for the same knob), not independent (i.e., changing one knob can impact others), and not universal (i.e., what works for one application may be sub-optimal for another). Worse, information about the effects of the knobs typically comes only from (expensive) experience.

To overcome these challenges, we present an automated approach that leverages past experience and collects new information to tune DBMS configurations: we use a combination of supervised and unsupervised machine learning methods to (1) select the most impactful knobs, (2) map unseen database workloads to previous workloads from which we can transfer experience, and (3) recommend knob settings. We implemented our techniques in a new tool called OtterTune and tested it on two DBMSs. Our evaluation shows that OtterTune recommends configurations that are as good as or better than ones generated by existing tools or a human expert.

References

  1. MySQL -- InnoDB startup options and system variables. http: //dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html.Google ScholarGoogle Scholar
  2. MySQL Tuning Primer Script. https://launchpad.net/mysql-tuning-primer.Google ScholarGoogle Scholar
  3. OLTPBenchmark.com. http://oltpbenchmark.com.Google ScholarGoogle Scholar
  4. PostgreSQL Configuration Wizard. http://pgfoundry.org/projects/pgtune/.Google ScholarGoogle Scholar
  5. scikit-learn Documentation -- Factor Analysis. http://scikit-learn.org/stable/modules/generated/sklearn. decomposition.FactorAnalysis.html.Google ScholarGoogle Scholar
  6. scikit-learn Documentation -- KMeans. http://scikit-learn. org/stable/modules/generated/sklearn.cluster.KMeans.html.Google ScholarGoogle Scholar
  7. S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated selection of materialized views and indexes in SQL databases. In VLDB, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Agrawal, V. Narasayya, and B. Yang. Integrating vertical and horizontal partitioning into automated physical database design. In SIGMOD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. C. Barrett, D. G. Clayton, P. Concannon, B. Akolkar, J. D. Cooper, H. A. Erlich, C. Julier, G. Morahan, J. Nerup, C. Nierras, et al. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nature genetics, 41(6):703--707, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  10. P. Belknap, B. Dageville, K. Dias, and K. Yagoub. Self-tuning for SQL performance in Oracle Database 11g. In ICDE, pages 1694--1700, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Bernstein, M. Brodie, S. Ceri, D. DeWitt, M. Franklin, H. Garcia-Molina, J. Gray, J. Held, J. Hellerstein, H. Jagadish, et al. The asilomar report on database research. SIGMOD record , 27(4):74--80, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Boncz, T. Neumann, and O. Erling. TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark. 2014.Google ScholarGoogle Scholar
  13. K. P. Brown, M. J. Carey, and M. Livny. Goal-oriented buffer management revisited. In SIGMOD, pages 353--364, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Casella and R. L. Berger. Statistical Inference. Duxbury advanced series in statistics and decision sciences. Duxbury Press, 2002.Google ScholarGoogle Scholar
  15. S. Chaudhuri and V. Narasayya. Autoadmin "what-if" index analysis utility. SIGMOD Rec. , 27(2):367--378, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Chaudhuri and V. Narasayya. Self-tuning database systems: a decade of progress. In VLDB, pages 3--14, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Chaudhuri and V. R. Narasayya. An efficient cost-driven index selection tool for microsoft SQL server. In VLDB , pages 146--155, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In SoCC, pages 143--154, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Curino, E. Jones, Y. Zhang, and S. Madden. Schism: a workload-drive approach to database replication and partitioning. In VLDB, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Danna and L. Perron. Structured vs. unstructured large neighborhood search: A case study on job-shop scheduling problems with earliness and tardiness costs. In Principles and Practice of Constraint Programming, volume 2833, pages 817--821, 2003.Google ScholarGoogle Scholar
  21. B. Debnath, D. Lilja, and M. Mokbel. SARD: A statistical approach for ranking database tuning parameters. In ICDEW, pages 11--18, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Dias, M. Ramacher, U. Shaft, V. Venkataramani, and G. Wood. Automatic performance diagnosis and tuning in oracle. In CIdR, 2005.Google ScholarGoogle Scholar
  23. D. E. Difallah, A. Pavlo, C. Curino, and P. Cudre-Mauroux. OLTP-Bench: an extensible testbed for benchmarking relational databases. In VLDB, pages 277--288, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Duan, V. Thummala, and S. Babu. Tuning database configuration parameters with iTuned. VLDB, 2:1246--1257, August 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Dworin. Data science revealed: A data-driven glimpse into the burgeoning new field. Dec. 2011.Google ScholarGoogle Scholar
  26. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of Statistics, 32(2):407--499, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  27. F. Focacci, F. Laburthe, and A. Lodi. Handbook of Metaheuristics, chapter Local Search and Constraint Programming. Springer, 2003.Google ScholarGoogle Scholar
  28. H. Gupta, V. Harinarayan, A. Rajaraman, and J. D. Ullman. Index selection for olap. In ICDE, pages 208--219, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  30. A. Jain, M. Murty, and P. Flynn. Data clustering: A review. volume 31, pages 264--323, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. K. Jain and R. C. Dubes. Algorithms for clustering data. Prentice-Hall, Inc., 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Krause and C. S. Ong. Contextual gaussian process bandit optimization. In NIPS, pages 2447--2455, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Kumar. Oracle Database 10g: The self-managing database, Nov. 2003. White Paper.Google ScholarGoogle Scholar
  34. E. Kwan, S. Lightstone, A. Storm, and L. Wu. Automatic configuration for IBM DB2 universal database. Technical report, IBM, jan 2002.Google ScholarGoogle Scholar
  35. D. Laney. 3-D data management: Controlling data volume, velocity and variety. Feb. 2001.Google ScholarGoogle Scholar
  36. M. Linster. Best practices for becoming an exceptional postgres dba. http://www.enterprisedb.com/ best-practices-becoming-exceptional-postgres-dba, Aug. 2014.Google ScholarGoogle Scholar
  37. D. Narayanan, E. Thereska, and A. Ailamaki. Continuous resource monitoring for self-predicting DBMS. In MASCOTS, pages 239--248, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Pavlo, E. P. Jones, and S. Zdonik. On predictive modeling for optimizing transaction execution in parallel OLTP systems. VLDB, 5:85--96, October 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. D. T. Pham, S. S. Dimov, and C. D. Nguyen. Selection of k in k-means clustering. In IMechE, volume 219, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  41. J. Rao, C. Zhang, N. Megiddo, and G. Lohman. Automating physical database design in a parallel database. In SIGMOD, pages 558--569, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. C. E. Rasmussen and C. K. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. A. Rosenberg. Improving query performance in data warehouses. Business Intelligence Journal, 11, Jan. 2006.Google ScholarGoogle Scholar
  44. A. A. Soror, U. F. Minhas, A. Aboulnaga, K. Salem, P. Kokosielis, and S. Kamath. Automatic virtual machine configuration for database workloads. In SIGMOD, pages 953--966, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. N. Srinivas, A. Krause, S. Kakade, and M. Seeger. Gaussian process optimizatioin in the bandit setting: No regret and experimental design. In Proceedings of the 27th International Conference on Machine Learning, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. M. Stonebraker, S. Madden, and P. Dubey. Intel "big data" science and technology center vision and execution plan. SIGMOD Rec., 42(1):44--49, May 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. A. J. Storm, C. Garcia-Arellano, S. S. Lightstone, Y. Diao, and M. Surendra. Adaptive self-tuning memory in DB2. In VLDB, pages 1081--1092, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. C. Sugar. Techniques for clustering and classification with applications to medical problems. PhD thesis, Stanford University, 1998.Google ScholarGoogle Scholar
  49. D. G. Sullivan, M. I. Seltzer, and A. Pfeffer. Using probabilistic reasoning to automate software tuning. SIGMETRICS, pages 404--405, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. M. A. et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. CoRR, abs/1603.04467, 2016.Google ScholarGoogle Scholar
  51. The Transaction Processing Council. TPC-C Benchmark (Revision 5.9.0). http://www.tpc.org/tpcc/spec/tpcc_current.pdf, June 2007.Google ScholarGoogle Scholar
  52. The Transaction Processing Council. TPC-H Benchmark (Revision 2.16.0). http://www.tpc.org/tpch/spec/tpch2.16.0.pdf, December 2013.Google ScholarGoogle Scholar
  53. W. Tian, P. Martin, and W. Powley. Techniques for automatically sizing multiple buffer pools in DB2. In CASCON, pages 294--302, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58:267--288, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  55. R. Tibshirani, G. Walther, and T. Hastie. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 63:411--423, 2001.Google ScholarGoogle Scholar
  56. R. J. Tibshirani, A. Rinaldo, R. Tibshirani, and L. Wasserman. Uniform asymptotic inference and the bootstrap after model selection. arXiv preprint arXiv:1506.06266, 2015.Google ScholarGoogle Scholar
  57. R. J. Tibshirani, J. Taylor, R. Lockhart, and R. Tibshirani. Exact post-selection inference for sequential regression procedures. arXiv preprint arXiv:1401.3889, 2014.Google ScholarGoogle Scholar
  58. D. N. Tran, P. C. Huynh, Y. C. Tay, and A. K. H. Tung. A new approach to dynamic self-tuning of database buffers. Trans. Storage, 4(1):3:1--3:25, May 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. G. Valentin, M. Zuliani, D. Zilio, G. Lohman, and A. Skelley. DB2 advisor: an optimizer smart enough to recommend its own indexes. In ICDE, pages 101--110, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. G. Weikum, C. Hasse, A. Mönkeberg, and P. Zabback. The COMFORT automatic tuning project. Information Systems, 19(5):381--432, July 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. B. Xi, Z. Liu, M. Raghavachari, C. H. Xia, and L. Zhang. A smart hill-climbing algorithm for application server configuration. In WWW, pages 287--296, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. K. Yagoub, P. Belknap, B. Dageville, K. Dias, S. Joshi, and H. Yu. Oracle's sql performance analyzer. IEEE Data Engineering Bulletin, 31(1), 2008.Google ScholarGoogle Scholar
  63. D. Y. Yoon, N. Niu, and B. Mozafari. DBSherlock: a performance diagnostic tool for transactional databases. In SIGMOD, pages 1599--1614, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. C. Zhang, A. Kumar, and C. Ré. Materialization optimizations for feature selection workloads. In SIGMOD, pages 265--276, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. D. C. Zilio. Physical Database Design Decision Algorithms and Concurrent Reorganization for Parallel Database Systems. PhD thesis, University of Toronto, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automatic Database Management System Tuning Through Large-scale Machine Learning

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
          May 2017
          1810 pages
          ISBN:9781450341974
          DOI:10.1145/3035918

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 May 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader