ABSTRACT
Database management system (DBMS) configuration tuning is an essential aspect of any data-intensive application effort. But this is historically a difficult task because DBMSs have hundreds of configuration "knobs" that control everything in the system, such as the amount of memory to use for caches and how often data is written to storage. The problem with these knobs is that they are not standardized (i.e., two DBMSs use a different name for the same knob), not independent (i.e., changing one knob can impact others), and not universal (i.e., what works for one application may be sub-optimal for another). Worse, information about the effects of the knobs typically comes only from (expensive) experience.
To overcome these challenges, we present an automated approach that leverages past experience and collects new information to tune DBMS configurations: we use a combination of supervised and unsupervised machine learning methods to (1) select the most impactful knobs, (2) map unseen database workloads to previous workloads from which we can transfer experience, and (3) recommend knob settings. We implemented our techniques in a new tool called OtterTune and tested it on two DBMSs. Our evaluation shows that OtterTune recommends configurations that are as good as or better than ones generated by existing tools or a human expert.
- MySQL -- InnoDB startup options and system variables. http: //dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html.Google Scholar
- MySQL Tuning Primer Script. https://launchpad.net/mysql-tuning-primer.Google Scholar
- OLTPBenchmark.com. http://oltpbenchmark.com.Google Scholar
- PostgreSQL Configuration Wizard. http://pgfoundry.org/projects/pgtune/.Google Scholar
- scikit-learn Documentation -- Factor Analysis. http://scikit-learn.org/stable/modules/generated/sklearn. decomposition.FactorAnalysis.html.Google Scholar
- scikit-learn Documentation -- KMeans. http://scikit-learn. org/stable/modules/generated/sklearn.cluster.KMeans.html.Google Scholar
- S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated selection of materialized views and indexes in SQL databases. In VLDB, 2000. Google ScholarDigital Library
- S. Agrawal, V. Narasayya, and B. Yang. Integrating vertical and horizontal partitioning into automated physical database design. In SIGMOD, 2004. Google ScholarDigital Library
- J. C. Barrett, D. G. Clayton, P. Concannon, B. Akolkar, J. D. Cooper, H. A. Erlich, C. Julier, G. Morahan, J. Nerup, C. Nierras, et al. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nature genetics, 41(6):703--707, 2009.Google ScholarCross Ref
- P. Belknap, B. Dageville, K. Dias, and K. Yagoub. Self-tuning for SQL performance in Oracle Database 11g. In ICDE, pages 1694--1700, 2009. Google ScholarDigital Library
- P. Bernstein, M. Brodie, S. Ceri, D. DeWitt, M. Franklin, H. Garcia-Molina, J. Gray, J. Held, J. Hellerstein, H. Jagadish, et al. The asilomar report on database research. SIGMOD record , 27(4):74--80, 1998. Google ScholarDigital Library
- P. Boncz, T. Neumann, and O. Erling. TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark. 2014.Google Scholar
- K. P. Brown, M. J. Carey, and M. Livny. Goal-oriented buffer management revisited. In SIGMOD, pages 353--364, 1996. Google ScholarDigital Library
- G. Casella and R. L. Berger. Statistical Inference. Duxbury advanced series in statistics and decision sciences. Duxbury Press, 2002.Google Scholar
- S. Chaudhuri and V. Narasayya. Autoadmin "what-if" index analysis utility. SIGMOD Rec. , 27(2):367--378, 1998. Google ScholarDigital Library
- S. Chaudhuri and V. Narasayya. Self-tuning database systems: a decade of progress. In VLDB, pages 3--14, 2007. Google ScholarDigital Library
- S. Chaudhuri and V. R. Narasayya. An efficient cost-driven index selection tool for microsoft SQL server. In VLDB , pages 146--155, 1997. Google ScholarDigital Library
- B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In SoCC, pages 143--154, 2010. Google ScholarDigital Library
- C. Curino, E. Jones, Y. Zhang, and S. Madden. Schism: a workload-drive approach to database replication and partitioning. In VLDB, 2010. Google ScholarDigital Library
- E. Danna and L. Perron. Structured vs. unstructured large neighborhood search: A case study on job-shop scheduling problems with earliness and tardiness costs. In Principles and Practice of Constraint Programming, volume 2833, pages 817--821, 2003.Google Scholar
- B. Debnath, D. Lilja, and M. Mokbel. SARD: A statistical approach for ranking database tuning parameters. In ICDEW, pages 11--18, 2008. Google ScholarDigital Library
- K. Dias, M. Ramacher, U. Shaft, V. Venkataramani, and G. Wood. Automatic performance diagnosis and tuning in oracle. In CIdR, 2005.Google Scholar
- D. E. Difallah, A. Pavlo, C. Curino, and P. Cudre-Mauroux. OLTP-Bench: an extensible testbed for benchmarking relational databases. In VLDB, pages 277--288, 2013. Google ScholarDigital Library
- S. Duan, V. Thummala, and S. Babu. Tuning database configuration parameters with iTuned. VLDB, 2:1246--1257, August 2009. Google ScholarDigital Library
- D. Dworin. Data science revealed: A data-driven glimpse into the burgeoning new field. Dec. 2011.Google Scholar
- B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of Statistics, 32(2):407--499, 2004.Google ScholarCross Ref
- F. Focacci, F. Laburthe, and A. Lodi. Handbook of Metaheuristics, chapter Local Search and Constraint Programming. Springer, 2003.Google Scholar
- H. Gupta, V. Harinarayan, A. Rajaraman, and J. D. Ullman. Index selection for olap. In ICDE, pages 208--219, 1997. Google ScholarDigital Library
- T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2001.Google ScholarCross Ref
- A. Jain, M. Murty, and P. Flynn. Data clustering: A review. volume 31, pages 264--323, 1999. Google ScholarDigital Library
- A. K. Jain and R. C. Dubes. Algorithms for clustering data. Prentice-Hall, Inc., 1988. Google ScholarDigital Library
- A. Krause and C. S. Ong. Contextual gaussian process bandit optimization. In NIPS, pages 2447--2455, 2011. Google ScholarDigital Library
- S. Kumar. Oracle Database 10g: The self-managing database, Nov. 2003. White Paper.Google Scholar
- E. Kwan, S. Lightstone, A. Storm, and L. Wu. Automatic configuration for IBM DB2 universal database. Technical report, IBM, jan 2002.Google Scholar
- D. Laney. 3-D data management: Controlling data volume, velocity and variety. Feb. 2001.Google Scholar
- M. Linster. Best practices for becoming an exceptional postgres dba. http://www.enterprisedb.com/ best-practices-becoming-exceptional-postgres-dba, Aug. 2014.Google Scholar
- D. Narayanan, E. Thereska, and A. Ailamaki. Continuous resource monitoring for self-predicting DBMS. In MASCOTS, pages 239--248, 2005. Google ScholarDigital Library
- A. Pavlo, E. P. Jones, and S. Zdonik. On predictive modeling for optimizing transaction execution in parallel OLTP systems. VLDB, 5:85--96, October 2011. Google ScholarDigital Library
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011. Google ScholarDigital Library
- D. T. Pham, S. S. Dimov, and C. D. Nguyen. Selection of k in k-means clustering. In IMechE, volume 219, 2005.Google ScholarCross Ref
- J. Rao, C. Zhang, N. Megiddo, and G. Lohman. Automating physical database design in a parallel database. In SIGMOD, pages 558--569, 2002. Google ScholarDigital Library
- C. E. Rasmussen and C. K. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2006. Google ScholarDigital Library
- A. Rosenberg. Improving query performance in data warehouses. Business Intelligence Journal, 11, Jan. 2006.Google Scholar
- A. A. Soror, U. F. Minhas, A. Aboulnaga, K. Salem, P. Kokosielis, and S. Kamath. Automatic virtual machine configuration for database workloads. In SIGMOD, pages 953--966, 2008. Google ScholarDigital Library
- N. Srinivas, A. Krause, S. Kakade, and M. Seeger. Gaussian process optimizatioin in the bandit setting: No regret and experimental design. In Proceedings of the 27th International Conference on Machine Learning, 2010.Google ScholarDigital Library
- M. Stonebraker, S. Madden, and P. Dubey. Intel "big data" science and technology center vision and execution plan. SIGMOD Rec., 42(1):44--49, May 2013. Google ScholarDigital Library
- A. J. Storm, C. Garcia-Arellano, S. S. Lightstone, Y. Diao, and M. Surendra. Adaptive self-tuning memory in DB2. In VLDB, pages 1081--1092, 2006. Google ScholarDigital Library
- C. Sugar. Techniques for clustering and classification with applications to medical problems. PhD thesis, Stanford University, 1998.Google Scholar
- D. G. Sullivan, M. I. Seltzer, and A. Pfeffer. Using probabilistic reasoning to automate software tuning. SIGMETRICS, pages 404--405, 2004. Google ScholarDigital Library
- M. A. et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. CoRR, abs/1603.04467, 2016.Google Scholar
- The Transaction Processing Council. TPC-C Benchmark (Revision 5.9.0). http://www.tpc.org/tpcc/spec/tpcc_current.pdf, June 2007.Google Scholar
- The Transaction Processing Council. TPC-H Benchmark (Revision 2.16.0). http://www.tpc.org/tpch/spec/tpch2.16.0.pdf, December 2013.Google Scholar
- W. Tian, P. Martin, and W. Powley. Techniques for automatically sizing multiple buffer pools in DB2. In CASCON, pages 294--302, 2003. Google ScholarDigital Library
- R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58:267--288, 1996.Google ScholarCross Ref
- R. Tibshirani, G. Walther, and T. Hastie. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 63:411--423, 2001.Google Scholar
- R. J. Tibshirani, A. Rinaldo, R. Tibshirani, and L. Wasserman. Uniform asymptotic inference and the bootstrap after model selection. arXiv preprint arXiv:1506.06266, 2015.Google Scholar
- R. J. Tibshirani, J. Taylor, R. Lockhart, and R. Tibshirani. Exact post-selection inference for sequential regression procedures. arXiv preprint arXiv:1401.3889, 2014.Google Scholar
- D. N. Tran, P. C. Huynh, Y. C. Tay, and A. K. H. Tung. A new approach to dynamic self-tuning of database buffers. Trans. Storage, 4(1):3:1--3:25, May 2008. Google ScholarDigital Library
- G. Valentin, M. Zuliani, D. Zilio, G. Lohman, and A. Skelley. DB2 advisor: an optimizer smart enough to recommend its own indexes. In ICDE, pages 101--110, 2000. Google ScholarDigital Library
- G. Weikum, C. Hasse, A. Mönkeberg, and P. Zabback. The COMFORT automatic tuning project. Information Systems, 19(5):381--432, July 1994. Google ScholarDigital Library
- B. Xi, Z. Liu, M. Raghavachari, C. H. Xia, and L. Zhang. A smart hill-climbing algorithm for application server configuration. In WWW, pages 287--296, 2004. Google ScholarDigital Library
- K. Yagoub, P. Belknap, B. Dageville, K. Dias, S. Joshi, and H. Yu. Oracle's sql performance analyzer. IEEE Data Engineering Bulletin, 31(1), 2008.Google Scholar
- D. Y. Yoon, N. Niu, and B. Mozafari. DBSherlock: a performance diagnostic tool for transactional databases. In SIGMOD, pages 1599--1614, 2016. Google ScholarDigital Library
- C. Zhang, A. Kumar, and C. Ré. Materialization optimizations for feature selection workloads. In SIGMOD, pages 265--276, 2014. Google ScholarDigital Library
- D. C. Zilio. Physical Database Design Decision Algorithms and Concurrent Reorganization for Parallel Database Systems. PhD thesis, University of Toronto, 1998. Google ScholarDigital Library
Index Terms
- Automatic Database Management System Tuning Through Large-scale Machine Learning
Recommendations
A demonstration of the ottertune automatic database management system tuning service
Database management systems (DBMSs) have a plethora of tunable knobs that control almost everything in the system. The performance of a DBMS is highly dependent on these configuration knobs, however, getting this tuning right is hard. Many organizations ...
An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning
SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataConfiguration tuning is vital to optimize the performance of database management system (DBMS). It becomes more tedious and urgent for cloud databases (CDB) due to the diverse database instances and query workloads, which make the database administrator ...
Policy-Based Coordination of Best-Practice Oriented Autonomic Database Tuning
COMPUTATIONWORLD '09: Proceedings of the 2009 Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, PatternsIn the last decades databases have been growing rapidly in scale and complexity. High performance, availability and further service level agreements need to be satisfied under any circumstances to please customers. Achieved database management system’s (...
Comments