Abstract
Identifying efficient execution plans for declarative OLAP queries typically entails estimation of several predicate selectivities. In practice, these estimates often differ significantly from the values actually encountered during query execution, leading to poor plan choices and grossly inflated response times. We propose here a conceptually new approach to address this classical problem, wherein the compile-time estimation process is completely eschewed for error-prone selectivities. Instead, from the set of optimal plans in the query’s selectivity error space, a limited subset, called the “plan bouquet,” is selected such that at least one of the bouquet plans is 2-optimal at each location in the space. Then, at run time, a sequence of cost-budgeted executions from the plan bouquet is carried out, eventually finding a plan that executes to completion within its assigned budget. The duration and switching of these executions is controlled by a graded progression of isosurfaces projected onto the optimal performance profile. We prove that this construction results, for the first time, in guarantees on worst-case performance sub-optimality. Moreover, it ensures repeatable execution strategies across different invocations of a query.
We then present a suite of enhancements to the basic plan bouquet algorithm, including randomized variants, that result in significantly stronger performance guarantees. An efficient isosurface identification algorithm is also introduced to curtail the bouquet construction overheads.
The plan bouquet approach has been empirically evaluated on both PostgreSQL and a commercial DBMS, over the TPC-H and TPC-DS benchmark environments. Our experimental results indicate that it delivers substantial improvements in the worst-case behavior, without impairing the average-case performance, as compared to the native optimizers of these systems. Moreover, it can be implemented using existing optimizer infrastructure, making it relatively easy to incorporate in current database engines.
Overall, the plan bouquet approach provides novel performance guarantees that open up new possibilities for robust query processing.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Plan Bouquets: A Fragrant Approach to Robust Query Processing
- Ashraf Aboulnaga and Surajit Chaudhuri. 1999. Self-tuning histograms: Building histograms without looking at data. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data (SIGMOD’99). 181--192. Google ScholarDigital Library
- Ron Avnur and Joseph M. Hellerstein. 2000. Eddies: Continuously adaptive query processing. In Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data (SIGMOD’00). 261--272. Google ScholarDigital Library
- Brian Babcock and Surajit Chaudhuri. 2005. Towards a robust query optimizer: A principled and practical approach. In Proc. of the 2005 ACM SIGMOD Intl. Conf. on Management of Data (SIGMOD’05). 119--130. Google ScholarDigital Library
- Shivnath Babu, Pedro Bizarro, and David DeWitt. 2005. Proactive re-optimization. In Proc. of the 2005 ACM SIGMOD Intl. Conf. on Management of Data (SIGMOD’05). 107--118. Google ScholarDigital Library
- Pedro Bizarro, Nicolas Bruno, and David J. DeWitt. 2009. Progressive parametric query optimization. IEEE Trans. Knowl. Data Eng. 21, 4 (2009), 582--594. Google ScholarDigital Library
- Renata Borovica-Gajic, Stratos Idreos, Anastasia Ailamaki, Marcin Zukowski, and Campbell Fraser. 2015. Smooth scan: Statistics-oblivious access paths. In Proc. of the 31st IEEE Intl. Conf. on Data Engg. (ICDE’15). 315--326.Google ScholarCross Ref
- Surajit Chaudhuri, Hongrae Lee, and Vivek R. Narasayya. 2010. Variance aware optimization of parameterized queries. In Proc. of the 2010 ACM SIGMOD Intl. Conf. on Management of Data (SIGMOD’10). 531--542. Google ScholarDigital Library
- Surajit Chaudhuri, Vivek Narasayya, and Ravi Ramamurthy. 2008. A pay-as-you-go framework for query execution feedback. In Proc. VLDB 1, 1 (2008), 1141--1152. Google ScholarDigital Library
- Marek Chrobak, Claire Kenyon, John Noga, and Neal E. Young. 2008. Incremental medians via online bidding. Algorithmica 50, 4 (2008), 455--478.Google ScholarDigital Library
- Francis Chu, Joseph Halpern, and Johannes Gehrke. 2002. Least expected cost query optimization: What can we expect? In Proc. of the 21st Symposium on Principles of Database Systems (PODS’02). 293--302. Google ScholarDigital Library
- Richard L. Cole and Goetz Graefe. 1994. Optimization of dynamic query evaluation plans. In Proc. of the 1994 ACM SIGMOD Intl. Conf. on Management of Data (SIGMOD’94). 150--160. Google ScholarDigital Library
- Amol Deshpande, Zachary Ives, and Vijayshankar Raman. 2007. Adaptive query processing. Foundations and Trends in Databases 1, 1 (2007), 1--140. Google ScholarDigital Library
- Anshuman Dutt and Jayant R. Haritsa. 2014a. Plan bouquets: Query processing without selectivity estimation. In Proc. of the 2014 ACM SIGMOD Intl. Conf. on Management of Data (SIGMOD’14). 1039--1050. Google ScholarDigital Library
- Anshuman Dutt and Jayant R. Haritsa. 2014b. Query Processing without Estimation. Technical Report TR-2014-01. Database Systems Lab, SERC/CSA, Indian Institute of Science. Retrieved from http://dsl.serc.iisc.ernet.in/publications/TR/TR-2014-01.pdf.Google Scholar
- Fedor V. Fomin and Alexey A. Stepanov. 2007. Counting minimum weighted dominating sets. In Computing and Combinatorics. Lecture Notes in Computer Science, Vol. 4598. Springer, Berlin, 165--175. Google ScholarDigital Library
- Goetz Graefe. 2012. New algorithms for join and grouping operations. Comput. Sci. 27, 1 (Feb. 2012), 3--27. Google ScholarDigital Library
- Goetz Graefe, Wey Guy, Harumi A. Kuno, and Glenn N. Paulley. 2012. Robust query processing (Dagstuhl seminar 12321). Dagstuhl Rep. 2, 8 (2012), 1--15.Google Scholar
- D. Harish, Pooja N. Darera, and Jayant R. Haritsa. 2007. On the production of anorexic plan diagrams. In Proc. of the 33rd Intl. Conf. on Very Large Data Bases (VLDB’07). 1081--1092. Google ScholarDigital Library
- D. Harish, Pooja N. Darera, and Jayant R. Haritsa. 2008. Identifying robust plans through plan diagram reduction. In Proc. VLDB 1, 1 (2008), 1124--1140.Google ScholarDigital Library
- IBM. 2003. Using a SELECTIVITY clause to influence the optimizer. Retrieved from www.ibm.com/developerworks/data/library/tips/dm-0312yip/.Google Scholar
- Yannis E. Ioannidis and Stavros Christodoulakis. 1991. On the propagation of errors in the size of join results. In Proc. of the 1991 ACM SIGMOD Intl. Conf. on Management of Data (SIGMOD’91). 268--277. Google ScholarDigital Library
- Navin Kabra and David J. DeWitt. 1998. Efficient mid-query re-optimization of sub-optimal query execution plans. In Proc. of the 1998 ACM SIGMOD Intl. Conf. on Management of Data (SIGMOD’98). 106--117. Google ScholarDigital Library
- Donald E. Knuth. 1997. The Art of Computer Programming, Volume 2 (3rd ed.): Seminumerical Algorithms. Addison-Wesley Professional, Boston, MA, 145--146. Google ScholarDigital Library
- Guy Lohman. 2014. Is Query Optimization a Solved Problem? Retrieved from http://wp.sigmod.org/?author=20.Google Scholar
- Zvi Lotker, Boaz Patt-Shamir, and Dror Rawitz. 2008. Rent, lease or buy: Randomized algorithms for multislope ski rental. In Proc. of the 25th Annual Symposium on Theoretical Aspects of Computer Science. 503--514.Google Scholar
- Volker Markl, Vijayshankar Raman, David Simmen, Guy Lohman, Hamid Pirahesh, and Miso Cilimdzic. 2004. Robust query processing through progressive optimization. In Proc. of the 2004 ACM SIGMOD Intl. Conf. on Management of Data (SIGMOD’04). 659--670. Google ScholarDigital Library
- Guido Moerkotte, Thomas Neumann, and Gabriele Steidl. 2009. Preventing bad plans by bounding the impact of cardinality estimation errors. In Proc. VLDB 2, 1 (Aug. 2009), 982--993. Google ScholarDigital Library
- Thomas Neumann and César A. Galindo-Legaria. 2013. Taking the edge off cardinality estimation errors using incremental execution. In Datenbanksysteme für Business, Technologie und Web (BTW), 15. Fachtagung des GI-Fachbereichs “Datenbanken und Informationssysteme” (DBIS), 11.-15.3.2013 in Magdeburg, Germany. Proceedings. 73--92.Google Scholar
- Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. 2012. Worst-case optimal join algorithms: {Extended abstract}. In Proc. of the 31st Symposium on Principles of Database Systems (PODS’12). 37--48. Google ScholarDigital Library
- Oracle. 2013. Optimizer with Oracle Database 12c. Retrieved from www.oracle.com/ocom/groups/public/@otn/documents/webcontent/1963236.pdf.Google Scholar
- Neoklis Polyzotis. 2005. Selectivity-based partitioning: A divide-and-union paradigm for effective query optimization. In Proc. of the 14th ACM Intl. Conf. on Information and Knowledge Management (CIKM’05). ACM, New York, NY, 720--727. Google ScholarDigital Library
- PostgreSQL. 2009. PostgreSQL 8.4. www.postgresql.org/docs/8.4/static/release.html. (2009).Google Scholar
- Naveen Reddy and Jayant R. Haritsa. 2005. Analyzing plan diagrams of database query optimizers. In Proc. of the 31st Intl. Conf. Very Large Data Bases (VLDB’05). 1228--1239. Google ScholarDigital Library
- P. Griffiths Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. 1979. Access path selection in a relational database management system. In Proc. of the 1979 ACM SIGMOD Intl. Conf. on Management of Data (SIGMOD’79). 23--34. Google ScholarDigital Library
- SQLSERVER. 2010. Using the USE PLAN Query Hint. Retrieved from technet.microsoft.com/en-us/library/ms186954(v=sql.105).aspx.Google Scholar
- Michael Stillger, Guy M. Lohman, Volker Markl, and Mokhtar Kandil. 2001. LEO - DB2’s learning optimizer. In Proc. of the 27th Intl. Conf. on Very Large Data Bases (VLDB’01). 19--28. Google ScholarDigital Library
- Kostas Tzoumas, Amol Deshpande, and Christian S. Jensen. 2013. Efficiently adapting graphical models for selectivity estimation. VLDB J. 22, 1 (2013), 3--27. Google ScholarDigital Library
- Todd L. Veldhuizen. 2014. Triejoin: A simple, worst-case optimal join algorithm. In Proc. of the 17th Intl. Conf. on Database Theory (ICDT’14). 96--106.Google Scholar
- Wentao Wu, Yun Chi, Shenghuo Zhu, Jun’ichi Tatemura, Hakan Hacigümüs, and Jeffrey F. Naughton. 2013. Predicting query execution time: Are optimizer cost models really unusable? In Proc. of the 29th IEEE Intl. Conf. on Data Engg. (ICDE’13). 1081--1092. Google ScholarDigital Library
Index Terms
- Plan Bouquets: A Fragrant Approach to Robust Query Processing
Recommendations
Plan bouquets: query processing without selectivity estimation
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of DataSelectivity estimates for optimizing OLAP queries often differ significantly from those actually encountered during query execution, leading to poor plan choices and inflated response times. We propose here a conceptually new approach to address this ...
Fast and scalable inequality joins
Inequality joins, which is to join relations with inequality conditions, are used in various applications. Optimizing joins has been the subject of intensive research ranging from efficient join algorithms such as sort-merge join, to the use of ...
Selectivity estimation for hybrid queries over text-rich data graphs
EDBT '13: Proceedings of the 16th International Conference on Extending Database TechnologyMany databases today are text-rich, comprising not only structured, but also textual data. Querying such databases involves predicates matching structured data combined with string predicates featuring textual constraints. Based on selectivity estimates ...
Comments