Abstract
Static analysis is a fundamental task in query optimization. In this article we study static analysis and optimization techniques for SPARQL, which is the standard language for querying Semantic Web data. Of particular interest for us is the optionality feature in SPARQL. It is crucial in Semantic Web data management, where data sources are inherently incomplete and the user is usually interested in partial answers to queries. This feature is one of the most complicated constructors in SPARQL and also the one that makes this language depart from classical query languages such as relational conjunctive queries. We focus on the class of well-designed SPARQL queries, which has been proposed in the literature as a fragment of the language with good properties regarding query evaluation. We first propose a tree representation for SPARQL queries, called pattern trees, which captures the class of well-designed SPARQL graph patterns. Among other results, we propose several rules that can be used to transform pattern trees into a simple normal form, and study equivalence and containment. We also study the evaluation and enumeration problems for this class of queries.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Static analysis and optimization of semantic web queries
- Abadi, D. J., Marcus, A., Madden, S., and Hollenbach, K. J. 2007. Scalable semantic web data management using vertical partitioning. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB'07). ACM Press, New York, 411--422. Google ScholarDigital Library
- Ajtai, M., Fagin, R., and Stockmeyer, L. J. 2000. The closure of monadic np. J. Comput. Syst. Sci. 60, 3, 660--716. Google ScholarDigital Library
- Angles, R. and Gutierrez, C. 2008. The expressive power of SPARQL. In Proceedings of the 7th International Semantic Web Conference (ISWC'08). Lecture Notes in Computer Science, vol. 5318, Springer, 114--129. Google ScholarDigital Library
- Arenas, M. and Pérez, J. 2011. Querying semantic web data with SPARQL. In Proceedings of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'11). ACM Press, New York, 305--316. Google ScholarDigital Library
- Bagan, G., Durand, A., and Grandjean, E. 2007. On acyclic conjunctive queries and constant delay enumeration. In Proceedings of the 21st International Workshop on Computer Science Logic (CSL'07). Lecture Notes in Computer Science, vol. 4646, Springer, 208--222. Google ScholarDigital Library
- Berners-Lee, T. 2006. Linked data -- Design issues. http://www.w3.org/DesignIssues/LinkedData.html.Google Scholar
- Bizer, C., Heath, T., and Berners-Lee, T. 2009. Linked data - The story so far. Int. J. Semantic Web Inf. Syst. 5, 3, 1--22.Google ScholarCross Ref
- Chandra, A. K. and Merlin, P. M. 1977. Optimal implementation of conjunctive queries in relational data bases. In Proceedings of the 9th Annual ACM Symposium on Theory of Computing (STOC'77). ACM Press, New York, 77--90. Google ScholarDigital Library
- Chekol, M. W., Euzenat, J., Geneves, P., and Layaida, N. 2012. SPARQL query containment under shi axioms. In Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI'12). AAAI Press.Google Scholar
- Chekuri, C. and Rajaraman, A. 2000. Conjunctive query containment revisited. Theor. Comput. Sci. 239, 2, 211--229. Google ScholarDigital Library
- Cohen, S., Fadida, I., Kanza, Y., Kimelfeld, B., and Sagiv, Y. 2006. Full disjunctions: Polynomial-delay iterators in action. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB'06). ACM Press, New York, 739--750. Google ScholarDigital Library
- DBPedia. 2012. DBPedia.org. http://DBpedia.org/sparql.Google Scholar
- Flum, J., Frick, M., and Grohe, M. 2002. Query evaluation via tree-decompositions. J. ACM 49, 6, 716--752. Google ScholarDigital Library
- Gallego, M. A., Fernandez, J. D., Martinez-Prieto, M. A., and de la Fuente, P. 2011. An empirical study of real-world SPARQL queries. In Proceedings of the 1st International Workshop on Usage Analysis and the Web of Data (USEWOD'11).Google Scholar
- Gottlob, G., Leone, N., and Scarcello, F. 2000. A comparison of structural CSP decomposition methods. Artif. Intell. 124, 2, 243--282. Google ScholarDigital Library
- Gottlob, G., Leone, N., and Scarcello, F. 2002. Hypertree decompositions and tractable queries. J. Comput. Syst. Sci. 64, 3, 579--627.Google ScholarDigital Library
- Greco, G. and Scarcello, F. 2010a. The power of tree projections: Local consistency, greedy algorithms, and larger islands of tractability. In Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'10). ACM Press, New York, 327--338. Google ScholarDigital Library
- Greco, G. and Scarcello, F. 2010b. Structural tractability of enumerating CSP solutions. In Proceedings of the 16th International Conference on Principles and Practice of Constraint Programming (CP'10). Lecture Notes in Computer Science, vol. 6308, Springer, 236--251. Google ScholarDigital Library
- Gutierrez, C., Hurtado, C. A., Mendelzon, A. O., and Pérez, J. 2011. Foundations of semantic web databases. J. Comput. Syst. Sci. 77, 3, 520--541. Google ScholarDigital Library
- HM GOVERNMENT. 2012. data.gov.uk.http://data.gov.uk.Google Scholar
- Johnson, D. S., Papadimitriou, C. H., and Yannakakis, M. 1988. On generating all maximal independent sets. Inf. Process. Lett. 27, 3, 119--123. Google ScholarDigital Library
- Kanza, Y., Nutt, W., and Sagiv, Y. 2002. Querying incomplete information in semi-structured data. J. Comput. Syst. Sci. 64, 3, 655--693.Google ScholarDigital Library
- Larson, P.-A. and Zhou, J. 2005. View matching for outer-join views. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05). ACM Press, New York, 445--456. Google ScholarDigital Library
- Lassila, O. and Swick, R. R. 1999. Resource description framework (RDF) model and syntax. W3C Recommendation. http://www.w3.org/TR/PR-rdf-syntax.Google Scholar
- Letelier, A., Pérez, J., Pichler, R., and Skritek, S. 2012a. SPAM: A SPARQL analysis and manipulation tool. Proc. VLDB Endow. 5, 12, 1958--1961. Google ScholarDigital Library
- Letelier, A., Pérez, J., Pichler, R., and Skritek, S. 2012b. Static analysis and optimization of semantic web queries. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'12). ACM Press, New York, 89--100. Google ScholarDigital Library
- Mallea, A., Arenas, M., Hogan, A., and Polleres, A. 2011. On blank nodes. In Proceedings of the International Semantic Web Conference. 421--437. Google ScholarDigital Library
- Neumann, T. and Weikum, G. 2010. The RDF-3x engine for scalable management of RDF data. Int. J. VLDB 19, 1, 91--113. Google ScholarDigital Library
- Pérez, J., Arenas, M., and Gutierrez, C. 2006a. Semantics and complexity of SPARQL. In Proceedings of the 5th International Semantic Web Conference (ISWC'06). Lecture Notes in Computer Science, vol. 4273, Springer, 30--43. Google ScholarDigital Library
- Pérez, J., Arenas, M., and Gutierrez, C. 2006b. Semantics of SPARQL. Tech. rep. TR/DCC-2006-17, Universidad de Chile. http://users.dcc.uchile.cl/∼jperez/papers/sparql.semantics.pdf.Google Scholar
- Pérez, J., Arenas, M., and Gutierrez, C. 2009. Semantics and complexity of SPARQL. ACM Trans. Datab. Syst. 34, 3. Google ScholarDigital Library
- Picalausa, F. and Vansummeren, S. 2011. What are real sparql queries like? In Proceedings of the International Workshop on Semantic Web Information Management (SWIM'11). ACM Press, New York, 7. Google ScholarDigital Library
- Polleres, A. 2007. From sparql to rules (and back). In Proceedings of the 16th International Conference on World Wide Web (WWW'07). ACM Press, New York, 787--796. Google ScholarDigital Library
- Prudhommeaux, E. and Seaborne, A. 2008. SPARQL query language for RDF. W3C Recommendation. http://www.w3.org/TR/rdf-sparql-query/.Google Scholar
- Schmidt, M., Hornung, T., Kuchlin, N., Lausen, G., and Pinkel, C. 2008. An experimental comparison of RDF data management approaches in a SPARQL benchmark scenario. In Proceedings of the 7th International Semantic Web Conference (ISWC'08). Lecture Notes in Computer Science, vol. 5318, Springer, 82--97. Google ScholarDigital Library
- Schmidt, M., Meier, M., and Lausen, G. 2010. Foundations of SPARQL query optimization. In Proceedings of the 13th International Conference on Database Theory (ICDT'10). ACM Press, New York, 4--33. Google ScholarDigital Library
- Serfiotis, G., Koffina, I., Christophides, V., and Tannen, V. 2005. Containment and minimization of RDF/s query patterns. In Proceedings of the 4th International Semantic Web Conference (ISWC'05). Lecture Notes in Computer Science, vol. 3729, Springer, 607--623. Google ScholarDigital Library
- Sidirourgos, L., Goncalves, R., Kersten, M. L., Nes, N., and Manegold, S. 2008. Columnstore support for RDF data management: Not all swans are white. Proc. VLDB Endow. 1, 2, 1553--1563. Google ScholarDigital Library
- Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., and Reynolds, D. 2008. SPARQL basic graph pattern optimization using selectivity estimation. In Proceedings of the 17th International Conference on World Wide Web (WWW'08). ACM Press, New York, 595--604. Google ScholarDigital Library
- Ullman, J. D. 1997. Information integration using logical views. In Proceedings of the 6th International Conference on Database Theory (ICDT'97). Lecture Notes in Computer Science, vol. 1186, Springer, 19--40. Google ScholarDigital Library
- US GOVERNMENT. 2012. data.gov.http://www.data.gov.Google Scholar
- Weiss, C., Karras, P., and Bernstein, A. 2008. Hexastore: Sextuple indexing for semantic web data management. Proc. VLDB Endow. 1, 1, 1008--1019. Google ScholarDigital Library
- Yannakakis, M. 1981. Algorithms for acyclic database schemes. In Proceedings of the 7th International Conference on Very Large Data Bases (VLDB'81). 82--94. Google ScholarDigital Library
Index Terms
- Static analysis and optimization of semantic web queries
Recommendations
Containment and equivalence of well-designed SPARQL
PODS '14: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsQuery containment and query equivalence constitute important computational problems in the context of static query analysis and optimization. While these problems have been intensively studied for fragments of relational calculus, almost no works exist ...
Querying semantic web data with SPARQL
PODS '11: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsThe Semantic Web is the initiative of the W3C to make information on the Web readable not only by humans but also by machines. RDF is the data model for Semantic Web data, and SPARQL is the standard query language for this data model. In the last ten ...
Static analysis and optimization of semantic web queries
PODS '12: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database SystemsStatic analysis is a fundamental task in query optimization. In this paper we study static analysis and optimization techniques for SPARQL, which is the standard language for querying Semantic Web data. Of particular interest for us is the optionality ...
Comments