Abstract
Given a parametrized n-dimensional SQL query template and a choice of query optimizer, a plan diagram is a color-coded pictorial enumeration of the execution plan choices of the optimizer over the query parameter space. These diagrams have proved to be a powerful metaphor for the analysis and redesign of modern optimizers, and are gaining currency in diverse industrial and academic institutions. However, their utility is adversely impacted by the impractically large computational overheads incurred when standard brute-force exhaustive approaches are used for producing fine-grained diagrams on high-dimensional query templates.
In this paper, we investigate strategies for efficiently producing close approximations to complex plan diagrams. Our techniques are customized to the features available in the optimizer's API, ranging from the generic optimizers that provide only the optimal plan for a query, to those that also support costing of sub-optimal plans and enumerating rank-ordered lists of plans. The techniques collectively feature both random and grid sampling, as well as inference techniques based on nearest-neighbor classifiers, parametric query optimization and plan cost monotonicity.
Extensive experimentation with a representative set of TPC-H and TPC-DS-based query templates on industrial-strength optimizers indicates that our techniques are capable of delivering 90% accurate diagrams while incurring less than 15% of the computational overheads of the exhaustive approach. In fact, for full-featured optimizers, we can guarantee zero error with less than 10% overheads. These approximation techniques have been implemented in the publicly available Picasso optimizer visualization tool.
- G. Antonshenkov, "Dynamic Query Optimization in Rdb/VMS", Proc. of 9th IEEE Intl. Conf. on Data Engineering (ICDE), April 1993. Google ScholarDigital Library
- M. Charikar, S. Chaudhuri, R. Motwani and V. Narasayya,"Towards Estimation Error Guarantees for Distinct Values", Proc. of ACM Symp. on Principles of Database Systems (PODS), 2000. Google ScholarDigital Library
- F. Chu, J. Halpern and P. Seshadri, "Least Expected Cost Query Optimization: An Exercise in Utility", Proc. of ACM Symp. on Principles of Database Systems (PODS), May 1999. Google ScholarDigital Library
- E Chu, J. Halpern and J. Gehrke, "Least Expected Cost Query Optimization: What Can We Expect", Proc. of ACM Symp. on Principles of Database Systems (PODS), May 2002. Google ScholarDigital Library
- R. Cole and G. Graefe, "Optimization of Dynamic Query Evaluation Plans", Proc. of ACM SIGMOD Intl. Conf. on Management of Data, May 1994. Google ScholarDigital Library
- A. Deshpande, Z. Ives and V. Raman, "Adaptive Query Processing", Foundations and Trends in Databases, 2007. Google ScholarDigital Library
- A. Dey, S. Bhaumik, Harish D. and J. Haritsa "Efficient Generation of Approximate Plan Diagrams", Tech. Rep. TR-2008-01, DSL/SERC, Indian Inst. of Science, 2008. http://dsl.serc.iisc.ernet.in/publications/report/TR/TR-2008-01.pdfGoogle Scholar
- A. Ghosh, J. Parikh, V. Sengar and J. Haritsa, "Plan Selection based on Query Clustering", Proc. of 28th Intl. Conf. on Very Large Data Bases (VLDB), August 2002. Google ScholarDigital Library
- R. Gonzalez and R. Woods, Digital Image Processing, Pearson Prentice Hall, 2007. Google ScholarDigital Library
- P. Haas and L. Stokes. "Estimating the number of classes in a finite population". In Journal of the American Statistical Association, 93, 1998.Google Scholar
- P. Haas, J. Naughton, S. Seshadri and L. Stokes,"Sampling-Based Estimation of the Number of Distinct Values of an Attribute", Proc. of 21st Intl. Conf. on Very Large Databases (VLDB), 1995. Google ScholarDigital Library
- Harish D., P. Darera and J. Haritsa, "On the Production of Anorexic Plan Diagrams", Proc. of 33rd Intl. Conf. on Very Large Data Bases (VLDB), September 2007. Google ScholarDigital Library
- Harish D., P. Darera and J. Haritsa, "Identifying Robust Plans through Plan Diagram Reduction", Proc. of 34th Intl. Conf. on Very Large Data Bases (VLDB), August 2008.Google Scholar
- A. Hulgeri and S. Sudarshan, "Parametric Query Optimization for Linear and Piecewise Linear Cost Functions", Proc. of 28th Intl. Conf. on Very Large Data Bases (VLDB), August 2002. Google ScholarDigital Library
- A. Hulgeri and S. Sudarshan, "AniPQO: Almost Non-intrusive Parametric Query Optimization for Nonlinear Cost Functions", Proc. of 29th Intl. Conf. on Very Large Data Bases (VLDB), September 2003. Google ScholarDigital Library
- N. Kabra and D. DeWitt, "Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans", Proc. of ACM SIGMOD Intl. Conf. on Management of Data, May 1998. Google ScholarDigital Library
- V. Markl, V. Raman, D. Simmen, G. Lohman, H. Pirahesh and M. Cilimdzic, "Robust Query Processing through Progressive Optimization", Proc. of ACM SIGMOD Intl. Conf. on Management of Data, June 2004. Google ScholarDigital Library
- V. Prasad, "Parametric Query Optimization: A Geometric Approach", Master's Thesis, Dept. of Computer Science & Engineering, IIT Kanpur, April 1999.Google Scholar
- S. Rao, "Parametric Query Optimization: A Non-Geometric Approach", Master's Thesis, Dept. of Computer Science & Engineering, IIT Kanpur, March 1999.Google Scholar
- N. Reddy and J. Haritsa, "Analyzing Plan Diagrams of Database Query Optimizers", Proc. of 31st Intl. Conf. on Very Large Data Bases (VLDB), August 2005. Google ScholarDigital Library
- N. Reddy, "Next-Generation Relational Query Optimizers", Master's Thesis, Dept. of CSA, Indian Institute of Science, June 2005, http://dsl.serc.iisc.ernet.in/publications/thesis/naveen.pdf.Google Scholar
- P. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, Addison-Wesley, 2005. Google ScholarDigital Library
- Picasso Database Query Optimizer Visualizer, http://dsl.serc.iisc.ernet.in/projects/PICASSO/picasso.htmlGoogle Scholar
- http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.admin.doc/doc/t0024533.htmGoogle Scholar
- http://msdn2.microsoft.com/en-us/library/ms189298.aspxGoogle Scholar
- http://infocenter.sybase.com/help/index. jsp?topic=/com.sybase.dc34982\_1500/html/mig\_gde/BABIFCAF.htmGoogle Scholar
- http://postgresql.orgGoogle Scholar
- http://www.tpc.org/tpchGoogle Scholar
- http://www.tpc.org/tpcdsGoogle Scholar
Index Terms
- Efficiently approximating query optimizer plan diagrams
Recommendations
Analyzing plan diagrams of database query optimizers
VLDB '05: Proceedings of the 31st international conference on Very large data basesA "plan diagram" is a pictorial enumeration of the execution plan choices of a database query optimizer over the relational selectivity space. In this paper, we present and analyze representative plan diagrams on a suite of popular commercial query ...
Efficiently Pinpointing SPARQL Query Containments
Web EngineeringAbstractQuery containment is a fundamental problem in database research, which is relevant for many tasks such as query optimisation, view maintenance and query rewriting. For example, recent SPARQL engines built on Big Data frameworks that precompute ...
gFOV: A Full-Stack SPARQL Query Optimizer & Plan Visualizer
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge ManagementSPARQL is the standard query language for RDF data. A SPARQL query consists of basic graph patterns (BGPs), which are matched onto the data graph, and graph pattern operators, which specify how to merge the matched results. Despite the prevalence of ...
Comments