skip to main content
research-article

The complexity of regular expressions and property paths in SPARQL

Published:04 December 2013Publication History
Skip Abstract Section

Abstract

The World Wide Web Consortium (W3C) recently introduced property paths in SPARQL 1.1, a query language for RDF data. Property paths allow SPARQL queries to evaluate regular expressions over graph-structured data. However, they differ from standard regular expressions in several notable aspects. For example, they have a limited form of negation, they have numerical occurrence indicators as syntactic sugar, and their semantics on graphs is defined in a nonstandard manner.

We formalize the W3C semantics of property paths and investigate various query evaluation problems on graphs. More specifically, let x and y be two nodes in an edge-labeled graph and r be an expression. We study the complexities of: (1) deciding whether there exists a path from x to y that matches r and (2) counting how many paths from x to y match r. Our main results show that, compared to an alternative semantics of regular expressions on graphs, the complexity of (1) and (2) under W3C semantics is significantly higher. Whereas the alternative semantics remains in polynomial time for large fragments of expressions, the W3C semantics makes problems (1) and (2) intractable almost immediately.

As a side-result, we prove that the membership problem for regular expressions with numerical occurrence indicators and negation is in polynomial time.

References

  1. Abiteboul, S., Quass, D., Mchugh, J., Widom, J., and Wiener, J. L. 1997. The Lorel query language for semistructured data. Int. J. Digital Libraries 1, 1, 68--88.Google ScholarGoogle ScholarCross RefCross Ref
  2. Abiteboul, S. and Vianu, V. 1999. Regular path queries with constraints. J. Comput. Syst. Sci. 58, 3, 428--452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alechina, N. and Immerman, N. 2000. Reachability logic: An efficient fragment of transitive closure logic. Logic J. IGPL 8, 3, 325--337.Google ScholarGoogle ScholarCross RefCross Ref
  4. Alkhateeb, F., Baget, J.-F., and Euzenat, J. 2009. Extending SPARQL with regular expression patterns (for querying RDF). J. Web Semantics 7, 2, 57--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Alvarez, C. and Jenner, B. 1993. A very hard log-space counting class. Theor. Comput. Sci. 107, 1, 3--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Arenas, M., Conca, S., and Perez, J. 2012. Counting beyond a yottabyte, or how SPARQL 1.1 property paths will prevent adoption of the standard. In Proceedings of the International World Wide Web Conference (WWW'12). ACM Press, New York, 629--638. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Arenas, M. and Perez, J. 2011. Querying semantic web data with SPARQL. In Proceedings of the Symposium on Principles of Database Systems (PODS'11). ACM Press, New York, 305--316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bagan, G., Bonifati, A., and Groz, B. 2013. A trichotomy for regular simple path queries on graphs. In Proceedings of the Symposium on Principles of Database Systems (PODS'13). ACM Press, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Berge, C. 1973. Graphs and Hypergraphs. North-Holland Publishing Company.Google ScholarGoogle Scholar
  10. Bex, G. J., Neven, F., Schwentick, T., and Vansummeren, S. 2010. Inference of concise regular expressions and DTDS. ACM Trans. Datab. Syst. 35, 2, 11:1--11:47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Book, R., Even, S., Greibach, S., and Ott, G. 1971. Ambiguity in graphs and expressions. IEEE Trans. Comput. 20, 2, 149--153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Bray, T., Paoli, J., Sperberg-Mcqueen, Maler, C. M. E., and Yergeau, F. 2008. Extensible markup language xml 1.0, 5th ed. Tech. rep. WorldWideWeb Consortium (W3C). http://www.w3.org/TR/2008/REC-xml-20081126/.Google ScholarGoogle Scholar
  13. Bruggemann-Klein, A. and Wood, D. 1998. One-unambiguous regular languages. Inf. Comput. 142, 2, 182--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Buneman, P., Davidson, S. B., Hillebrand, G. G., and Suciu, D. 1996. A query language and optimization techniques for unstructured data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'96). ACM Press, New York, 505--516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Calvanese, D., De Giacomo, G., Lenzerini, M., and Vardi, M. Y. 2002. Rewriting of regular expressions and regular path queries. J. Comput. Syst. Sci. 64, 3, 443--465.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Calvanese, D., De Giacomo, G., Lenzerini, M., and Vardi, M. Y. 2000a. Containment of conjunctive regular path queries with inverse. In Principles of Knowledge Representation and Reasoning (KR). Morgan Kaufmann, 176--185.Google ScholarGoogle Scholar
  17. Calvanese, D., De Giacomo, G., Lenzerini, M., and Vardi, M. Y. 2000b. View-based query processing for regular path queries with inverse. In Proceedings of the Symposium on Principles of Database Systems (PODS'00). ACM Press, New York, 58--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Cleaveland, R. and Steffen, B. 1993. A linear-time model-checking algorithm for the alternation-free modal mu-calculus. Formal Methods Syst. Des. 2, 2, 121--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Colazzo, D., Ghelli, G., and Sartiani, C. 2009a. Efficient asymmetric inclusion between regular expression types. In Proceedings of the International Conference Database Theory (ICDT'09). ACM Press, New York, 174--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Colazzo, D., Ghelli, G., and Sartiani, C. 2009b. Efficient inclusion for a class of xml types with interleaving and counting. Inf. Syst. 34, 7, 643--656. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Consens, M. P. and Mendelzon, A. O. 1990. GraphLog: A visual formalism for real life recursion. In Proceedings of the Symposium on Principles of Database Systems (PODS'90). ACM Press, New York, 404--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Cruz, I. F., Mendelzon, A.O., and Wood, P. T. 1987. A graphical query language supporting recursion. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'87). ACM Press, New York, 323--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Deutsch, A. and Tannen, V. 2001. Optimization properties for classes of conjunctive regular path queries. In Proceedings of the International Workshop on Database Programming Languages (DBPL'01). Springer, 1--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Fallside, D. and Walmsley, P. 2004. XML schema part 0: Primer, 2nd ed. Tech. rep., World Wide Web Consortium. http://www.w3.org/TR/2004/REC-xmlschema-0-20041028/.Google ScholarGoogle Scholar
  25. Fernandez, M. F., Florescu, D., Levy, A, Y., and Suciu, D. 2000. Declarative specification of web sites with strudel. Very Large Datab. J. 9, 1, 38--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Florescu, D., Levy, A. Y., and Suciu, D. 1998. Query containment for conjunctive queries with regular expressions. In Proceedings of the Symposium on Principles of Database Systems (PODS'98). ACM Press, New York, 139--148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Gao, S., Sperberg-Mcqueen, C. M., Thompson, H. S., Mendelsohn, N., Beech, D., and Maloney, M. 2009. W3C XML schema definition language (XSD) 1.1 part 1: Structures. Tech. rep., World Wide Web Consortium. http://www.w3.org/TR/2009/CR-xmlschema11-1-20090430/.Google ScholarGoogle Scholar
  28. Gelade, W., Gyssens, M., and Martens, W. 2012. Regular expressions with counting: Weak versus strong determinism. SIAM J. Comput. 41, 1, 160--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Gelade, W., Martens, W., and Neven, F. 2009. Optimizing schema languages for XML: Numerical constraints and interleaving. SIAM J. Comput. 38, 5, 2021--2043. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Glushkov, V. M. 1961. The abstract theory of automata. Russian Math. Surv. 16, 5, 1--53.Google ScholarGoogle ScholarCross RefCross Ref
  31. Harris, S. and Seaborne, A. 2010. SPARQL 1.1 query language. Tech. rep., World Wide Web Consortium (W3C). http://www.w3.org/TR/2010/WD-sparql11-query-20100601/.Google ScholarGoogle Scholar
  32. Harris, S. and Seaborne, A. 2012. SPARQL 1.1 query language. Tech. rep.,World Wide Web Consortium (W3C). http://www.w3.org/TR/2012/WD-sparql11-query-20120105/.Google ScholarGoogle Scholar
  33. Hopcroft, J. E. and Ullman, J. D. 1979. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Kannan, S., Sweedyk, Z., and Mahaney, S. R. 1995. Counting and random generation of strings in regular languages. In Proceedings of the Symposium on Discrete Algorithms (SODA'95). SIAMS, 551--557. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Kilpelainen, P. and Tuhkanen, R. 2003. Regular expressions with numerical occurrence indicators- Preliminary results. In Proceedings of the Symposium on Programming Languages and Software Tools (SPLST'03). 63--173.Google ScholarGoogle Scholar
  36. Kilpelainen, P. and Tuhkanen, R. 2007. One-unambiguity of regular expressions with numeric occurrence indicators. Inf. Comput. 205, 6, 890--916. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kleene, S. C. 1956. Representations of events in nerve sets and finite automata. In Automata Studies, Princeton University Press, Princeton, NJ, 3--42.Google ScholarGoogle Scholar
  38. Lapaugh, A. S. and Papadimitriou, C. H. 1984. The even-path problem for graphs and digraphs. Netw. 14, 4, 507-513. http://onlinelibrary.wiley.com/doi/10.1002/net.3230140403/abstract.Google ScholarGoogle ScholarCross RefCross Ref
  39. Libkin, L., Martens, W., and Vrgoc, D. 2013. Querying graph databases with Xpath. In Proceedings of the International Conference on Database Theory (ICDT'13). ACM Press, New York, 129--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Libkin, L. and Vrgoc, D. 2012. Regular path queries on graphs with data. In Proceedings of the International Conference on Database Theory (ICDT'12). ACM Press, New York, 74--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Liu, Y. A. and Yu, F. 2002. Solving regular path queries. In Proceedings of the 6th International on Conference on Mathematics of Program Construction (MPC'02). Springer, 195--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Losemann, K. and Martens, W. 2012. The complexity of evaluating path expressions in sparql. In Proceedings of the Symposium on Principles of Database Systems (PODS'12). ACM Press, New York, 101--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Martens, W., Neven, F., and Schwentick, T. 2004. Complexity of decision problems for simple regular expressions. In Proceedings of the 29th International Symposium on Mathematical Foundations of Computer Science (MFCS'04). Springer, 889--900.Google ScholarGoogle Scholar
  44. Martens, W., Neven, F., and Schwentick, T. 2009. Complexity of decision problems for XML schemas and chain regular expressions. SIAM J. Comput. 39, 4, 1486--1530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Mendelzon, A. O. and Wood, P. T. 1995. Finding regular simple paths in graph databases. SIAM J. Comput. 24, 6, 1235--1258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Perez, J., Arenas, M., and Gutierrez, C. 2009. Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34, 3, 16:1--16:45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Perez, J., Arenas, M., and Gutierrez, C. 2010. nSPARQL: A navigational language for RDF. J. Web Semantics 8, 4, 255--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Ramakrishnan, R. and Gehrke, J. 2003. Database Management Systems 3rd Ed. McGraw-Hill, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Schmidt, M., Meier, M., and Lausen, G. 2010. Foundations of sparql query optimization. In Proceedings of the International Conference on Database Theory (ICDT'10). ACM Press, New York, 4--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Stockmeyer, L. 1974. The complexity of decision problems in automata theory and logic. Ph.D. dissertation, Massachusetts Institute of Technology. http://people.csail.mit.edu/meyer/Stockmeyer-thesis.pdf.Google ScholarGoogle Scholar
  51. Valiant, L. G. 1979. The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 3, 410--421.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Yannakakis, M. 1990. Graph-theoretic methods in database theory. In Proceedings of the Symposium on Principles of Database Systems (PODS'90). ACM Press, New York, 230--242. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The complexity of regular expressions and property paths in SPARQL

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Database Systems
              ACM Transactions on Database Systems  Volume 38, Issue 4
              Invited papers issue
              November 2013
              294 pages
              ISSN:0362-5915
              EISSN:1557-4644
              DOI:10.1145/2539032
              Issue’s Table of Contents

              Copyright © 2013 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 4 December 2013
              • Accepted: 1 May 2013
              • Received: 1 September 2012
              Published in tods Volume 38, Issue 4

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader