Abstract
A dominant cost for query evaluation in modern massively distributed systems is the number of communication rounds. For this reason, there is a growing interest in single-round multiway join algorithms where data are first reshuffled over many servers and then evaluated in a parallel but communication-free way. The reshuffling itself is specified as a distribution policy. We introduce a correctness condition, called parallel-correctness, for the evaluation of queries w.r.t. a distribution policy. We study the complexity of parallel-correctness for conjunctive queries as well as transferability of parallel-correctness between queries. We also investigate the complexity of transferability for certain families of distribution policies, including the Hypercube distribution policies.
- Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley.Google Scholar
- Foto N. Afrati, Paraschos Koutris, Dan Suciu, and Jeffrey D. Ullman. 2012. Parallel skyline queries. In Proceedings of the 15th International Conference on Database Theory (ICDT’12). ACM, 274--284. DOI:http://dx.doi.org/10.1145/2274576.2274605 Google ScholarDigital Library
- Foto N. Afrati and Jeffrey D. Ullman. 2010. Optimizing joins in a map-reduce environment. In Proceedings of the 13th International Conference on Extending Database Technology (EDBT’10). ACM, 99--110. DOI:http://dx.doi.org/10.1145/1739041.1739056 Google ScholarDigital Library
- Tom J. Ameloot, Gaetano Geck, Bas Ketsman, Frank Neven, and Thomas Schwentick. 2015. Parallel-correctness and transferability for conjunctive queries. In Proceedings of the 34th ACM Symposium on Principles of Database Systems (PODS’15). ACM, 47--58. DOI:http://dx.doi.org/10.1145/2745754.2745759 Google ScholarDigital Library
- Tom J. Ameloot, Gaetano Geck, Bas Ketsman, Frank Neven, and Thomas Schwentick. 2016. Data partitioning for single-round multi-join evaluation in massively parallel systems. SIGMOD Rec. 45, 1 (2016), 33--40. DOI:http://dx.doi.org/10.1145/2949741.2949750 Google ScholarDigital Library
- Tom J. Ameloot, Gaetano Geck, Bas Ketsman, Frank Neven, and Thomas Schwentick. 2017. Reasoning on data partitioning for single-round multi-join evaluation in massively parallel systems. Commun. ACM 60, 3 (Feb. 2017), 93--100. 0001-0782DOI:http://dx.doi.org/10.1145/3041063 Google ScholarDigital Library
- Tom J. Ameloot, Bas Ketsman, Frank Neven, and Daniel Zinn. 2014. Weaker forms of monotonicity for declarative networking: A more fine-grained answer to the CALM-conjecture. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’14). ACM, 64--75. DOI:http://dx.doi.org/10.1145/2594538.2594541 Google ScholarDigital Library
- Albert Atserias, Martin Grohe, and Dániel Marx. 2013. Size bounds and query plans for relational joins. SIAM J. Comput. 42, 4 (2013), 1737--1767. DOI:http://dx.doi.org/10.1137/110859440 Preliminary version in FOCS 08. Google ScholarDigital Library
- Paul Beame, Paraschos Koutris, and Dan Suciu. 2013. Communication steps for parallel query processing. In Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’13). ACM, 273--284. DOI:http://dx.doi.org/10.1145/2463664.2465224 Google ScholarDigital Library
- Paul Beame, Paraschos Koutris, and Dan Suciu. 2014. Skew in parallel query processing. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’14). ACM, 212--223. DOI:http://dx.doi.org/10.1145/2594538.2594558 Google ScholarDigital Library
- Ashok K. Chandra and Philip M. Merlin. 1977. Optimal implementation of conjunctive queries in relational data bases. In Proceedings of the 9th Annual ACM Symposium on Theory of Computing (STOC’77). ACM, 77--90. DOI:http://dx.doi.org/10.1145/800105.803397 Google ScholarDigital Library
- Shumo Chu, Magdalena Balazinska, and Dan Suciu. 2015. From theory to practice: Efficient join query evaluation in a parallel database system. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD’15). ACM, 63--78. DOI:http://dx.doi.org/10.1145/2723372.2750545 Google ScholarDigital Library
- Michael Fellows. 1988. Planar Emulators and Planar Covers. (Unpublished manuscript).Google Scholar
- Jörg Flum, Markus Frick, and Martin Grohe. 2002. Query evaluation via tree-decompositions. J. ACM 49, 6 (2002), 716--752. DOI:http://dx.doi.org/10.1145/602220.602222 Google ScholarDigital Library
- Sumit Ganguly, Abraham Silberschatz, and Shalom Tsur. 1990. A framework for the parallel processing of datalog queries. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD’90). ACM Press, 143--152. DOI:http://dx.doi.org/10.1145/93597.98724 Google ScholarDigital Library
- Sumit Ganguly, Abraham Silberschatz, and Shalom Tsur. 1992. Parallel bottom-up processing of datalog queries. J. Log. Program. 14, 182 (1992), 101--126. DOI:http://dx.doi.org/10.1016/0743-1066(92)90048-8 Google ScholarDigital Library
- Gaetano Geck, Bas Ketsman, Frank Neven, and Thomas Schwentick. 2016. Parallel-correctness and containment for conjunctive queries with union and negation. In Proceedings of the 19th International Conference on Database Theory (ICDT’16). 9:1--9:17. DOI:http://dx.doi.org/10.4230/LIPIcs.ICDT.2016.9Google Scholar
- Pavol Hell and Jaroslav Nesetril. 1992. The core of a graph. Discr. Math. 109, 1-3 (1992), 117--126. DOI:http://dx.doi.org/10.1016/0012-365X(92)90282-K Google ScholarDigital Library
- Bas Ketsman and Dan Suciu. 2017. A worst-case optimal multi-round algorithm for parallel computation of conjunctive queries. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS’17). 417--428. DOI:http://dx.doi.org/10.1145/3034786.3034788 Google ScholarDigital Library
- Shigeru Kitakubo. 1991. Planar branched coverings of graphs. Yokohama Math. J. 38, 2 (1991), 113--120.Google Scholar
- Paraschos Koutris, Paul Beame, and Dan Suciu. 2016. Worst-case optimal algorithms for parallel query processing. In Proceedings of the 19th International Conference on Database Theory, (ICDT 2016, Bordeaux, France, March 15-18, 2016. 8:1--8:18. DOI:http://dx.doi.org/10.4230/LIPIcs.ICDT.2016.8Google Scholar
- Paraschos Koutris and Dan Suciu. 2011. Parallel evaluation of conjunctive queries. In Proceedings of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS’11. ACM, 223--234. DOI:http://dx.doi.org/10.1145/1989284.1989310 Google ScholarDigital Library
- Spark 2014. Spark. (2014). http://spark.apache.org.Google Scholar
- Larry J. Stockmeyer. 1976. The polynomial-time hierarchy. Theor. Comput. Sci. 3, 1 (1976), 1--22. DOI:http://dx.doi.org/10.1016/0304-3975(76)90061-X Google ScholarCross Ref
- Gabriel Valiente. 2001. A general method for graph isomorphism. In Proceedings of the 13th International Symposium on Fundamentals of Computation Theory (FCT’01). 428--431. DOI:http://dx.doi.org/10.1007/3-540-44669-9_49 Google ScholarCross Ref
- Reynold S. Xin, Josh Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2013. Shark: SQL and rich analytics at scale. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’13). ACM, 13--24. DOI:http://dx.doi.org/10.1145/2463676.2465288 Google ScholarDigital Library
- Mihalis Yannakakis. 1981. Algorithms for acyclic database schemes. IEEE Press, 82--94.Google Scholar
- Daniel Zinn, Todd J. Green, and Bertram Ludäscher. 2012. Win-move is coordination-free (sometimes). In Proceedings of the 15th International Conference on Database Theory (ICDT’12). ACM, 99--113. DOI:http://dx.doi.org/10.1145/2274576.2274588 Google ScholarDigital Library
Index Terms
- Parallel-Correctness and Transferability for Conjunctive Queries
Recommendations
Parallel-Correctness and Transferability for Conjunctive Queries
PODS '15: Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsA dominant cost for query evaluation in modern massively distributed systems is the number of communication rounds. For this reason, there is a growing interest in single-round multiway join algorithms where data is first reshuffled over many servers ...
Parallel-Correctness and Containment for Conjunctive Queries with Union and Negation
Single-round multiway join algorithms first reshuffle data over many servers and then evaluate the query at hand in a parallel and communication-free way. A key question is whether a given distribution policy for the reshuffle is adequate for computing ...
Parallel evaluation of conjunctive queries
PODS '11: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsThe availability of large data centers with tens of thousands of servers has led to the popular adoption of massive parallelism for data analysis on large datasets. Several query languages exist for running queries on massively parallel architectures, ...
Comments