Automatic Database Management System Tuning Through Large-scale Machine Learning

Authors:
Dana Van Aken

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Andrew Pavlo

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Geoffrey J. Gordon

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Bohan Zhang

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of DataMay 2017Pages 1009–1024https://doi.org/10.1145/3035918.3064029

Published:09 May 2017Publication History

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

Pages 1009–1024

ABSTRACT

Database management system (DBMS) configuration tuning is an essential aspect of any data-intensive application effort. But this is historically a difficult task because DBMSs have hundreds of configuration "knobs" that control everything in the system, such as the amount of memory to use for caches and how often data is written to storage. The problem with these knobs is that they are not standardized (i.e., two DBMSs use a different name for the same knob), not independent (i.e., changing one knob can impact others), and not universal (i.e., what works for one application may be sub-optimal for another). Worse, information about the effects of the knobs typically comes only from (expensive) experience.

To overcome these challenges, we present an automated approach that leverages past experience and collects new information to tune DBMS configurations: we use a combination of supervised and unsupervised machine learning methods to (1) select the most impactful knobs, (2) map unseen database workloads to previous workloads from which we can transfer experience, and (3) recommend knob settings. We implemented our techniques in a new tool called OtterTune and tested it on two DBMSs. Our evaluation shows that OtterTune recommends configurations that are as good as or better than ones generated by existing tools or a human expert.

References

MySQL -- InnoDB startup options and system variables. http: //dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html.Google Scholar
MySQL Tuning Primer Script. https://launchpad.net/mysql-tuning-primer.Google Scholar
OLTPBenchmark.com. http://oltpbenchmark.com.Google Scholar
PostgreSQL Configuration Wizard. http://pgfoundry.org/projects/pgtune/.Google Scholar
scikit-learn Documentation -- Factor Analysis. http://scikit-learn.org/stable/modules/generated/sklearn. decomposition.FactorAnalysis.html.Google Scholar
scikit-learn Documentation -- KMeans. http://scikit-learn. org/stable/modules/generated/sklearn.cluster.KMeans.html.Google Scholar
S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated selection of materialized views and indexes in SQL databases. In VLDB, 2000. Google ScholarDigital Library
S. Agrawal, V. Narasayya, and B. Yang. Integrating vertical and horizontal partitioning into automated physical database design. In SIGMOD, 2004. Google ScholarDigital Library
J. C. Barrett, D. G. Clayton, P. Concannon, B. Akolkar, J. D. Cooper, H. A. Erlich, C. Julier, G. Morahan, J. Nerup, C. Nierras, et al. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nature genetics, 41(6):703--707, 2009.Google ScholarCross Ref
P. Belknap, B. Dageville, K. Dias, and K. Yagoub. Self-tuning for SQL performance in Oracle Database 11g. In ICDE, pages 1694--1700, 2009. Google ScholarDigital Library
P. Bernstein, M. Brodie, S. Ceri, D. DeWitt, M. Franklin, H. Garcia-Molina, J. Gray, J. Held, J. Hellerstein, H. Jagadish, et al. The asilomar report on database research. SIGMOD record , 27(4):74--80, 1998. Google ScholarDigital Library
P. Boncz, T. Neumann, and O. Erling. TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark. 2014.Google Scholar
K. P. Brown, M. J. Carey, and M. Livny. Goal-oriented buffer management revisited. In SIGMOD, pages 353--364, 1996. Google ScholarDigital Library
G. Casella and R. L. Berger. Statistical Inference. Duxbury advanced series in statistics and decision sciences. Duxbury Press, 2002.Google Scholar
S. Chaudhuri and V. Narasayya. Autoadmin "what-if" index analysis utility. SIGMOD Rec. , 27(2):367--378, 1998. Google ScholarDigital Library
S. Chaudhuri and V. Narasayya. Self-tuning database systems: a decade of progress. In VLDB, pages 3--14, 2007. Google ScholarDigital Library
S. Chaudhuri and V. R. Narasayya. An efficient cost-driven index selection tool for microsoft SQL server. In VLDB , pages 146--155, 1997. Google ScholarDigital Library
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In SoCC, pages 143--154, 2010. Google ScholarDigital Library
C. Curino, E. Jones, Y. Zhang, and S. Madden. Schism: a workload-drive approach to database replication and partitioning. In VLDB, 2010. Google ScholarDigital Library
E. Danna and L. Perron. Structured vs. unstructured large neighborhood search: A case study on job-shop scheduling problems with earliness and tardiness costs. In Principles and Practice of Constraint Programming, volume 2833, pages 817--821, 2003.Google Scholar
B. Debnath, D. Lilja, and M. Mokbel. SARD: A statistical approach for ranking database tuning parameters. In ICDEW, pages 11--18, 2008. Google ScholarDigital Library
K. Dias, M. Ramacher, U. Shaft, V. Venkataramani, and G. Wood. Automatic performance diagnosis and tuning in oracle. In CIdR, 2005.Google Scholar
D. E. Difallah, A. Pavlo, C. Curino, and P. Cudre-Mauroux. OLTP-Bench: an extensible testbed for benchmarking relational databases. In VLDB, pages 277--288, 2013. Google ScholarDigital Library
S. Duan, V. Thummala, and S. Babu. Tuning database configuration parameters with iTuned. VLDB, 2:1246--1257, August 2009. Google ScholarDigital Library
D. Dworin. Data science revealed: A data-driven glimpse into the burgeoning new field. Dec. 2011.Google Scholar
B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of Statistics, 32(2):407--499, 2004.Google ScholarCross Ref
F. Focacci, F. Laburthe, and A. Lodi. Handbook of Metaheuristics, chapter Local Search and Constraint Programming. Springer, 2003.Google Scholar
H. Gupta, V. Harinarayan, A. Rajaraman, and J. D. Ullman. Index selection for olap. In ICDE, pages 208--219, 1997. Google ScholarDigital Library
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2001.Google ScholarCross Ref
A. Jain, M. Murty, and P. Flynn. Data clustering: A review. volume 31, pages 264--323, 1999. Google ScholarDigital Library
A. K. Jain and R. C. Dubes. Algorithms for clustering data. Prentice-Hall, Inc., 1988. Google ScholarDigital Library
A. Krause and C. S. Ong. Contextual gaussian process bandit optimization. In NIPS, pages 2447--2455, 2011. Google ScholarDigital Library
S. Kumar. Oracle Database 10g: The self-managing database, Nov. 2003. White Paper.Google Scholar
E. Kwan, S. Lightstone, A. Storm, and L. Wu. Automatic configuration for IBM DB2 universal database. Technical report, IBM, jan 2002.Google Scholar
D. Laney. 3-D data management: Controlling data volume, velocity and variety. Feb. 2001.Google Scholar
M. Linster. Best practices for becoming an exceptional postgres dba. http://www.enterprisedb.com/ best-practices-becoming-exceptional-postgres-dba, Aug. 2014.Google Scholar
D. Narayanan, E. Thereska, and A. Ailamaki. Continuous resource monitoring for self-predicting DBMS. In MASCOTS, pages 239--248, 2005. Google ScholarDigital Library
A. Pavlo, E. P. Jones, and S. Zdonik. On predictive modeling for optimizing transaction execution in parallel OLTP systems. VLDB, 5:85--96, October 2011. Google ScholarDigital Library
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011. Google ScholarDigital Library
D. T. Pham, S. S. Dimov, and C. D. Nguyen. Selection of k in k-means clustering. In IMechE, volume 219, 2005.Google ScholarCross Ref
J. Rao, C. Zhang, N. Megiddo, and G. Lohman. Automating physical database design in a parallel database. In SIGMOD, pages 558--569, 2002. Google ScholarDigital Library
C. E. Rasmussen and C. K. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2006. Google ScholarDigital Library
A. Rosenberg. Improving query performance in data warehouses. Business Intelligence Journal, 11, Jan. 2006.Google Scholar
A. A. Soror, U. F. Minhas, A. Aboulnaga, K. Salem, P. Kokosielis, and S. Kamath. Automatic virtual machine configuration for database workloads. In SIGMOD, pages 953--966, 2008. Google ScholarDigital Library
N. Srinivas, A. Krause, S. Kakade, and M. Seeger. Gaussian process optimizatioin in the bandit setting: No regret and experimental design. In Proceedings of the 27th International Conference on Machine Learning, 2010.Google ScholarDigital Library
M. Stonebraker, S. Madden, and P. Dubey. Intel "big data" science and technology center vision and execution plan. SIGMOD Rec., 42(1):44--49, May 2013. Google ScholarDigital Library
A. J. Storm, C. Garcia-Arellano, S. S. Lightstone, Y. Diao, and M. Surendra. Adaptive self-tuning memory in DB2. In VLDB, pages 1081--1092, 2006. Google ScholarDigital Library
C. Sugar. Techniques for clustering and classification with applications to medical problems. PhD thesis, Stanford University, 1998.Google Scholar
D. G. Sullivan, M. I. Seltzer, and A. Pfeffer. Using probabilistic reasoning to automate software tuning. SIGMETRICS, pages 404--405, 2004. Google ScholarDigital Library
M. A. et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. CoRR, abs/1603.04467, 2016.Google Scholar
The Transaction Processing Council. TPC-C Benchmark (Revision 5.9.0). http://www.tpc.org/tpcc/spec/tpcc_current.pdf, June 2007.Google Scholar
The Transaction Processing Council. TPC-H Benchmark (Revision 2.16.0). http://www.tpc.org/tpch/spec/tpch2.16.0.pdf, December 2013.Google Scholar
W. Tian, P. Martin, and W. Powley. Techniques for automatically sizing multiple buffer pools in DB2. In CASCON, pages 294--302, 2003. Google ScholarDigital Library
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58:267--288, 1996.Google ScholarCross Ref
R. Tibshirani, G. Walther, and T. Hastie. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 63:411--423, 2001.Google Scholar
R. J. Tibshirani, A. Rinaldo, R. Tibshirani, and L. Wasserman. Uniform asymptotic inference and the bootstrap after model selection. arXiv preprint arXiv:1506.06266, 2015.Google Scholar
R. J. Tibshirani, J. Taylor, R. Lockhart, and R. Tibshirani. Exact post-selection inference for sequential regression procedures. arXiv preprint arXiv:1401.3889, 2014.Google Scholar
D. N. Tran, P. C. Huynh, Y. C. Tay, and A. K. H. Tung. A new approach to dynamic self-tuning of database buffers. Trans. Storage, 4(1):3:1--3:25, May 2008. Google ScholarDigital Library
G. Valentin, M. Zuliani, D. Zilio, G. Lohman, and A. Skelley. DB2 advisor: an optimizer smart enough to recommend its own indexes. In ICDE, pages 101--110, 2000. Google ScholarDigital Library
G. Weikum, C. Hasse, A. Mönkeberg, and P. Zabback. The COMFORT automatic tuning project. Information Systems, 19(5):381--432, July 1994. Google ScholarDigital Library
B. Xi, Z. Liu, M. Raghavachari, C. H. Xia, and L. Zhang. A smart hill-climbing algorithm for application server configuration. In WWW, pages 287--296, 2004. Google ScholarDigital Library
K. Yagoub, P. Belknap, B. Dageville, K. Dias, S. Joshi, and H. Yu. Oracle's sql performance analyzer. IEEE Data Engineering Bulletin, 31(1), 2008.Google Scholar
D. Y. Yoon, N. Niu, and B. Mozafari. DBSherlock: a performance diagnostic tool for transactional databases. In SIGMOD, pages 1599--1614, 2016. Google ScholarDigital Library
C. Zhang, A. Kumar, and C. Ré. Materialization optimizations for feature selection workloads. In SIGMOD, pages 265--276, 2014. Google ScholarDigital Library
D. C. Zilio. Physical Database Design Decision Algorithms and Concurrent Reorganization for Parallel Database Systems. PhD thesis, University of Toronto, 1998. Google ScholarDigital Library

Index Terms

Automatic Database Management System Tuning Through Large-scale Machine Learning
1. Information systems
  1. Data management systems
    1. Database administration
      1. Autonomous database administration
      2. Database utilities and tools
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory

Recommendations

A demonstration of the ottertune automatic database management system tuning service

Database management systems (DBMSs) have a plethora of tunable knobs that control almost everything in the system. The performance of a DBMS is highly dependent on these configuration knobs, however, getting this tuning right is hard. Many organizations ...
Read More
An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

Configuration tuning is vital to optimize the performance of database management system (DBMS). It becomes more tedious and urgent for cloud databases (CDB) due to the diverse database instances and query workloads, which make the database administrator ...
Read More
Policy-Based Coordination of Best-Practice Oriented Autonomic Database Tuning
COMPUTATIONWORLD '09: Proceedings of the 2009 Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns

In the last decades databases have been growing rapidly in scale and complexity. High performance, availability and further service level agreements need to be satisfied under any circumstances to please customers. Achieved database management system’s (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
May 2017
1810 pages
ISBN:9781450341974
DOI:10.1145/3035918
General Chairs:
Rada Chirkova
North Carolina State University, USA
,
Jun Yang
Duke University, USA
,
Program Chair:
Dan Suciu
University of Washington, USA
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 May 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
autonomic computing
database management systems
database tuning
machine learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 282
  Total Citations
  View Citations
- 8,627
  Total Downloads
- Downloads (Last 12 months)1,772
- Downloads (Last 6 weeks)238
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic Database Management System Tuning Through Large-scale Machine Learning

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

A demonstration of the ottertune automatic database management system tuning service

An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning

Policy-Based Coordination of Best-Practice Oriented Autonomic Database Tuning