skip to main content
research-article

Static analysis and optimization of semantic web queries

Published:04 December 2013Publication History
Skip Abstract Section

Abstract

Static analysis is a fundamental task in query optimization. In this article we study static analysis and optimization techniques for SPARQL, which is the standard language for querying Semantic Web data. Of particular interest for us is the optionality feature in SPARQL. It is crucial in Semantic Web data management, where data sources are inherently incomplete and the user is usually interested in partial answers to queries. This feature is one of the most complicated constructors in SPARQL and also the one that makes this language depart from classical query languages such as relational conjunctive queries. We focus on the class of well-designed SPARQL queries, which has been proposed in the literature as a fragment of the language with good properties regarding query evaluation. We first propose a tree representation for SPARQL queries, called pattern trees, which captures the class of well-designed SPARQL graph patterns. Among other results, we propose several rules that can be used to transform pattern trees into a simple normal form, and study equivalence and containment. We also study the evaluation and enumeration problems for this class of queries.

Skip Supplemental Material Section

Supplemental Material

References

  1. Abadi, D. J., Marcus, A., Madden, S., and Hollenbach, K. J. 2007. Scalable semantic web data management using vertical partitioning. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB'07). ACM Press, New York, 411--422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ajtai, M., Fagin, R., and Stockmeyer, L. J. 2000. The closure of monadic np. J. Comput. Syst. Sci. 60, 3, 660--716. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Angles, R. and Gutierrez, C. 2008. The expressive power of SPARQL. In Proceedings of the 7th International Semantic Web Conference (ISWC'08). Lecture Notes in Computer Science, vol. 5318, Springer, 114--129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Arenas, M. and Pérez, J. 2011. Querying semantic web data with SPARQL. In Proceedings of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'11). ACM Press, New York, 305--316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bagan, G., Durand, A., and Grandjean, E. 2007. On acyclic conjunctive queries and constant delay enumeration. In Proceedings of the 21st International Workshop on Computer Science Logic (CSL'07). Lecture Notes in Computer Science, vol. 4646, Springer, 208--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Berners-Lee, T. 2006. Linked data -- Design issues. http://www.w3.org/DesignIssues/LinkedData.html.Google ScholarGoogle Scholar
  7. Bizer, C., Heath, T., and Berners-Lee, T. 2009. Linked data - The story so far. Int. J. Semantic Web Inf. Syst. 5, 3, 1--22.Google ScholarGoogle ScholarCross RefCross Ref
  8. Chandra, A. K. and Merlin, P. M. 1977. Optimal implementation of conjunctive queries in relational data bases. In Proceedings of the 9th Annual ACM Symposium on Theory of Computing (STOC'77). ACM Press, New York, 77--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chekol, M. W., Euzenat, J., Geneves, P., and Layaida, N. 2012. SPARQL query containment under shi axioms. In Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI'12). AAAI Press.Google ScholarGoogle Scholar
  10. Chekuri, C. and Rajaraman, A. 2000. Conjunctive query containment revisited. Theor. Comput. Sci. 239, 2, 211--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cohen, S., Fadida, I., Kanza, Y., Kimelfeld, B., and Sagiv, Y. 2006. Full disjunctions: Polynomial-delay iterators in action. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB'06). ACM Press, New York, 739--750. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. DBPedia. 2012. DBPedia.org. http://DBpedia.org/sparql.Google ScholarGoogle Scholar
  13. Flum, J., Frick, M., and Grohe, M. 2002. Query evaluation via tree-decompositions. J. ACM 49, 6, 716--752. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gallego, M. A., Fernandez, J. D., Martinez-Prieto, M. A., and de la Fuente, P. 2011. An empirical study of real-world SPARQL queries. In Proceedings of the 1st International Workshop on Usage Analysis and the Web of Data (USEWOD'11).Google ScholarGoogle Scholar
  15. Gottlob, G., Leone, N., and Scarcello, F. 2000. A comparison of structural CSP decomposition methods. Artif. Intell. 124, 2, 243--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gottlob, G., Leone, N., and Scarcello, F. 2002. Hypertree decompositions and tractable queries. J. Comput. Syst. Sci. 64, 3, 579--627.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Greco, G. and Scarcello, F. 2010a. The power of tree projections: Local consistency, greedy algorithms, and larger islands of tractability. In Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'10). ACM Press, New York, 327--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Greco, G. and Scarcello, F. 2010b. Structural tractability of enumerating CSP solutions. In Proceedings of the 16th International Conference on Principles and Practice of Constraint Programming (CP'10). Lecture Notes in Computer Science, vol. 6308, Springer, 236--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Gutierrez, C., Hurtado, C. A., Mendelzon, A. O., and Pérez, J. 2011. Foundations of semantic web databases. J. Comput. Syst. Sci. 77, 3, 520--541. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. HM GOVERNMENT. 2012. data.gov.uk.http://data.gov.uk.Google ScholarGoogle Scholar
  21. Johnson, D. S., Papadimitriou, C. H., and Yannakakis, M. 1988. On generating all maximal independent sets. Inf. Process. Lett. 27, 3, 119--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kanza, Y., Nutt, W., and Sagiv, Y. 2002. Querying incomplete information in semi-structured data. J. Comput. Syst. Sci. 64, 3, 655--693.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Larson, P.-A. and Zhou, J. 2005. View matching for outer-join views. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05). ACM Press, New York, 445--456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lassila, O. and Swick, R. R. 1999. Resource description framework (RDF) model and syntax. W3C Recommendation. http://www.w3.org/TR/PR-rdf-syntax.Google ScholarGoogle Scholar
  25. Letelier, A., Pérez, J., Pichler, R., and Skritek, S. 2012a. SPAM: A SPARQL analysis and manipulation tool. Proc. VLDB Endow. 5, 12, 1958--1961. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Letelier, A., Pérez, J., Pichler, R., and Skritek, S. 2012b. Static analysis and optimization of semantic web queries. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'12). ACM Press, New York, 89--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Mallea, A., Arenas, M., Hogan, A., and Polleres, A. 2011. On blank nodes. In Proceedings of the International Semantic Web Conference. 421--437. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Neumann, T. and Weikum, G. 2010. The RDF-3x engine for scalable management of RDF data. Int. J. VLDB 19, 1, 91--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Pérez, J., Arenas, M., and Gutierrez, C. 2006a. Semantics and complexity of SPARQL. In Proceedings of the 5th International Semantic Web Conference (ISWC'06). Lecture Notes in Computer Science, vol. 4273, Springer, 30--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Pérez, J., Arenas, M., and Gutierrez, C. 2006b. Semantics of SPARQL. Tech. rep. TR/DCC-2006-17, Universidad de Chile. http://users.dcc.uchile.cl/∼jperez/papers/sparql.semantics.pdf.Google ScholarGoogle Scholar
  31. Pérez, J., Arenas, M., and Gutierrez, C. 2009. Semantics and complexity of SPARQL. ACM Trans. Datab. Syst. 34, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Picalausa, F. and Vansummeren, S. 2011. What are real sparql queries like? In Proceedings of the International Workshop on Semantic Web Information Management (SWIM'11). ACM Press, New York, 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Polleres, A. 2007. From sparql to rules (and back). In Proceedings of the 16th International Conference on World Wide Web (WWW'07). ACM Press, New York, 787--796. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Prudhommeaux, E. and Seaborne, A. 2008. SPARQL query language for RDF. W3C Recommendation. http://www.w3.org/TR/rdf-sparql-query/.Google ScholarGoogle Scholar
  35. Schmidt, M., Hornung, T., Kuchlin, N., Lausen, G., and Pinkel, C. 2008. An experimental comparison of RDF data management approaches in a SPARQL benchmark scenario. In Proceedings of the 7th International Semantic Web Conference (ISWC'08). Lecture Notes in Computer Science, vol. 5318, Springer, 82--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Schmidt, M., Meier, M., and Lausen, G. 2010. Foundations of SPARQL query optimization. In Proceedings of the 13th International Conference on Database Theory (ICDT'10). ACM Press, New York, 4--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Serfiotis, G., Koffina, I., Christophides, V., and Tannen, V. 2005. Containment and minimization of RDF/s query patterns. In Proceedings of the 4th International Semantic Web Conference (ISWC'05). Lecture Notes in Computer Science, vol. 3729, Springer, 607--623. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sidirourgos, L., Goncalves, R., Kersten, M. L., Nes, N., and Manegold, S. 2008. Columnstore support for RDF data management: Not all swans are white. Proc. VLDB Endow. 1, 2, 1553--1563. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., and Reynolds, D. 2008. SPARQL basic graph pattern optimization using selectivity estimation. In Proceedings of the 17th International Conference on World Wide Web (WWW'08). ACM Press, New York, 595--604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Ullman, J. D. 1997. Information integration using logical views. In Proceedings of the 6th International Conference on Database Theory (ICDT'97). Lecture Notes in Computer Science, vol. 1186, Springer, 19--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. US GOVERNMENT. 2012. data.gov.http://www.data.gov.Google ScholarGoogle Scholar
  42. Weiss, C., Karras, P., and Bernstein, A. 2008. Hexastore: Sextuple indexing for semantic web data management. Proc. VLDB Endow. 1, 1, 1008--1019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Yannakakis, M. 1981. Algorithms for acyclic database schemes. In Proceedings of the 7th International Conference on Very Large Data Bases (VLDB'81). 82--94. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Static analysis and optimization of semantic web queries

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Database Systems
      ACM Transactions on Database Systems  Volume 38, Issue 4
      Invited papers issue
      November 2013
      294 pages
      ISSN:0362-5915
      EISSN:1557-4644
      DOI:10.1145/2539032
      Issue’s Table of Contents

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 December 2013
      • Accepted: 1 June 2013
      • Revised: 1 April 2013
      • Received: 1 October 2012
      Published in tods Volume 38, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader