Abstract
The World Wide Web Consortium (W3C) recently introduced property paths in SPARQL 1.1, a query language for RDF data. Property paths allow SPARQL queries to evaluate regular expressions over graph-structured data. However, they differ from standard regular expressions in several notable aspects. For example, they have a limited form of negation, they have numerical occurrence indicators as syntactic sugar, and their semantics on graphs is defined in a nonstandard manner.
We formalize the W3C semantics of property paths and investigate various query evaluation problems on graphs. More specifically, let x and y be two nodes in an edge-labeled graph and r be an expression. We study the complexities of: (1) deciding whether there exists a path from x to y that matches r and (2) counting how many paths from x to y match r. Our main results show that, compared to an alternative semantics of regular expressions on graphs, the complexity of (1) and (2) under W3C semantics is significantly higher. Whereas the alternative semantics remains in polynomial time for large fragments of expressions, the W3C semantics makes problems (1) and (2) intractable almost immediately.
As a side-result, we prove that the membership problem for regular expressions with numerical occurrence indicators and negation is in polynomial time.
- Abiteboul, S., Quass, D., Mchugh, J., Widom, J., and Wiener, J. L. 1997. The Lorel query language for semistructured data. Int. J. Digital Libraries 1, 1, 68--88.Google ScholarCross Ref
- Abiteboul, S. and Vianu, V. 1999. Regular path queries with constraints. J. Comput. Syst. Sci. 58, 3, 428--452. Google ScholarDigital Library
- Alechina, N. and Immerman, N. 2000. Reachability logic: An efficient fragment of transitive closure logic. Logic J. IGPL 8, 3, 325--337.Google ScholarCross Ref
- Alkhateeb, F., Baget, J.-F., and Euzenat, J. 2009. Extending SPARQL with regular expression patterns (for querying RDF). J. Web Semantics 7, 2, 57--73. Google ScholarDigital Library
- Alvarez, C. and Jenner, B. 1993. A very hard log-space counting class. Theor. Comput. Sci. 107, 1, 3--30. Google ScholarDigital Library
- Arenas, M., Conca, S., and Perez, J. 2012. Counting beyond a yottabyte, or how SPARQL 1.1 property paths will prevent adoption of the standard. In Proceedings of the International World Wide Web Conference (WWW'12). ACM Press, New York, 629--638. Google ScholarDigital Library
- Arenas, M. and Perez, J. 2011. Querying semantic web data with SPARQL. In Proceedings of the Symposium on Principles of Database Systems (PODS'11). ACM Press, New York, 305--316. Google ScholarDigital Library
- Bagan, G., Bonifati, A., and Groz, B. 2013. A trichotomy for regular simple path queries on graphs. In Proceedings of the Symposium on Principles of Database Systems (PODS'13). ACM Press, New York. Google ScholarDigital Library
- Berge, C. 1973. Graphs and Hypergraphs. North-Holland Publishing Company.Google Scholar
- Bex, G. J., Neven, F., Schwentick, T., and Vansummeren, S. 2010. Inference of concise regular expressions and DTDS. ACM Trans. Datab. Syst. 35, 2, 11:1--11:47. Google ScholarDigital Library
- Book, R., Even, S., Greibach, S., and Ott, G. 1971. Ambiguity in graphs and expressions. IEEE Trans. Comput. 20, 2, 149--153. Google ScholarDigital Library
- Bray, T., Paoli, J., Sperberg-Mcqueen, Maler, C. M. E., and Yergeau, F. 2008. Extensible markup language xml 1.0, 5th ed. Tech. rep. WorldWideWeb Consortium (W3C). http://www.w3.org/TR/2008/REC-xml-20081126/.Google Scholar
- Bruggemann-Klein, A. and Wood, D. 1998. One-unambiguous regular languages. Inf. Comput. 142, 2, 182--206. Google ScholarDigital Library
- Buneman, P., Davidson, S. B., Hillebrand, G. G., and Suciu, D. 1996. A query language and optimization techniques for unstructured data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'96). ACM Press, New York, 505--516. Google ScholarDigital Library
- Calvanese, D., De Giacomo, G., Lenzerini, M., and Vardi, M. Y. 2002. Rewriting of regular expressions and regular path queries. J. Comput. Syst. Sci. 64, 3, 443--465.Google ScholarDigital Library
- Calvanese, D., De Giacomo, G., Lenzerini, M., and Vardi, M. Y. 2000a. Containment of conjunctive regular path queries with inverse. In Principles of Knowledge Representation and Reasoning (KR). Morgan Kaufmann, 176--185.Google Scholar
- Calvanese, D., De Giacomo, G., Lenzerini, M., and Vardi, M. Y. 2000b. View-based query processing for regular path queries with inverse. In Proceedings of the Symposium on Principles of Database Systems (PODS'00). ACM Press, New York, 58--66. Google ScholarDigital Library
- Cleaveland, R. and Steffen, B. 1993. A linear-time model-checking algorithm for the alternation-free modal mu-calculus. Formal Methods Syst. Des. 2, 2, 121--147. Google ScholarDigital Library
- Colazzo, D., Ghelli, G., and Sartiani, C. 2009a. Efficient asymmetric inclusion between regular expression types. In Proceedings of the International Conference Database Theory (ICDT'09). ACM Press, New York, 174--182. Google ScholarDigital Library
- Colazzo, D., Ghelli, G., and Sartiani, C. 2009b. Efficient inclusion for a class of xml types with interleaving and counting. Inf. Syst. 34, 7, 643--656. Google ScholarDigital Library
- Consens, M. P. and Mendelzon, A. O. 1990. GraphLog: A visual formalism for real life recursion. In Proceedings of the Symposium on Principles of Database Systems (PODS'90). ACM Press, New York, 404--416. Google ScholarDigital Library
- Cruz, I. F., Mendelzon, A.O., and Wood, P. T. 1987. A graphical query language supporting recursion. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'87). ACM Press, New York, 323--330. Google ScholarDigital Library
- Deutsch, A. and Tannen, V. 2001. Optimization properties for classes of conjunctive regular path queries. In Proceedings of the International Workshop on Database Programming Languages (DBPL'01). Springer, 1--39. Google ScholarDigital Library
- Fallside, D. and Walmsley, P. 2004. XML schema part 0: Primer, 2nd ed. Tech. rep., World Wide Web Consortium. http://www.w3.org/TR/2004/REC-xmlschema-0-20041028/.Google Scholar
- Fernandez, M. F., Florescu, D., Levy, A, Y., and Suciu, D. 2000. Declarative specification of web sites with strudel. Very Large Datab. J. 9, 1, 38--55. Google ScholarDigital Library
- Florescu, D., Levy, A. Y., and Suciu, D. 1998. Query containment for conjunctive queries with regular expressions. In Proceedings of the Symposium on Principles of Database Systems (PODS'98). ACM Press, New York, 139--148. Google ScholarDigital Library
- Gao, S., Sperberg-Mcqueen, C. M., Thompson, H. S., Mendelsohn, N., Beech, D., and Maloney, M. 2009. W3C XML schema definition language (XSD) 1.1 part 1: Structures. Tech. rep., World Wide Web Consortium. http://www.w3.org/TR/2009/CR-xmlschema11-1-20090430/.Google Scholar
- Gelade, W., Gyssens, M., and Martens, W. 2012. Regular expressions with counting: Weak versus strong determinism. SIAM J. Comput. 41, 1, 160--190. Google ScholarDigital Library
- Gelade, W., Martens, W., and Neven, F. 2009. Optimizing schema languages for XML: Numerical constraints and interleaving. SIAM J. Comput. 38, 5, 2021--2043. Google ScholarDigital Library
- Glushkov, V. M. 1961. The abstract theory of automata. Russian Math. Surv. 16, 5, 1--53.Google ScholarCross Ref
- Harris, S. and Seaborne, A. 2010. SPARQL 1.1 query language. Tech. rep., World Wide Web Consortium (W3C). http://www.w3.org/TR/2010/WD-sparql11-query-20100601/.Google Scholar
- Harris, S. and Seaborne, A. 2012. SPARQL 1.1 query language. Tech. rep.,World Wide Web Consortium (W3C). http://www.w3.org/TR/2012/WD-sparql11-query-20120105/.Google Scholar
- Hopcroft, J. E. and Ullman, J. D. 1979. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Boston, MA. Google ScholarDigital Library
- Kannan, S., Sweedyk, Z., and Mahaney, S. R. 1995. Counting and random generation of strings in regular languages. In Proceedings of the Symposium on Discrete Algorithms (SODA'95). SIAMS, 551--557. Google ScholarDigital Library
- Kilpelainen, P. and Tuhkanen, R. 2003. Regular expressions with numerical occurrence indicators- Preliminary results. In Proceedings of the Symposium on Programming Languages and Software Tools (SPLST'03). 63--173.Google Scholar
- Kilpelainen, P. and Tuhkanen, R. 2007. One-unambiguity of regular expressions with numeric occurrence indicators. Inf. Comput. 205, 6, 890--916. Google ScholarDigital Library
- Kleene, S. C. 1956. Representations of events in nerve sets and finite automata. In Automata Studies, Princeton University Press, Princeton, NJ, 3--42.Google Scholar
- Lapaugh, A. S. and Papadimitriou, C. H. 1984. The even-path problem for graphs and digraphs. Netw. 14, 4, 507-513. http://onlinelibrary.wiley.com/doi/10.1002/net.3230140403/abstract.Google ScholarCross Ref
- Libkin, L., Martens, W., and Vrgoc, D. 2013. Querying graph databases with Xpath. In Proceedings of the International Conference on Database Theory (ICDT'13). ACM Press, New York, 129--140. Google ScholarDigital Library
- Libkin, L. and Vrgoc, D. 2012. Regular path queries on graphs with data. In Proceedings of the International Conference on Database Theory (ICDT'12). ACM Press, New York, 74--85. Google ScholarDigital Library
- Liu, Y. A. and Yu, F. 2002. Solving regular path queries. In Proceedings of the 6th International on Conference on Mathematics of Program Construction (MPC'02). Springer, 195--208. Google ScholarDigital Library
- Losemann, K. and Martens, W. 2012. The complexity of evaluating path expressions in sparql. In Proceedings of the Symposium on Principles of Database Systems (PODS'12). ACM Press, New York, 101--112. Google ScholarDigital Library
- Martens, W., Neven, F., and Schwentick, T. 2004. Complexity of decision problems for simple regular expressions. In Proceedings of the 29th International Symposium on Mathematical Foundations of Computer Science (MFCS'04). Springer, 889--900.Google Scholar
- Martens, W., Neven, F., and Schwentick, T. 2009. Complexity of decision problems for XML schemas and chain regular expressions. SIAM J. Comput. 39, 4, 1486--1530. Google ScholarDigital Library
- Mendelzon, A. O. and Wood, P. T. 1995. Finding regular simple paths in graph databases. SIAM J. Comput. 24, 6, 1235--1258. Google ScholarDigital Library
- Perez, J., Arenas, M., and Gutierrez, C. 2009. Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34, 3, 16:1--16:45. Google ScholarDigital Library
- Perez, J., Arenas, M., and Gutierrez, C. 2010. nSPARQL: A navigational language for RDF. J. Web Semantics 8, 4, 255--270. Google ScholarDigital Library
- Ramakrishnan, R. and Gehrke, J. 2003. Database Management Systems 3rd Ed. McGraw-Hill, New York. Google ScholarDigital Library
- Schmidt, M., Meier, M., and Lausen, G. 2010. Foundations of sparql query optimization. In Proceedings of the International Conference on Database Theory (ICDT'10). ACM Press, New York, 4--33. Google ScholarDigital Library
- Stockmeyer, L. 1974. The complexity of decision problems in automata theory and logic. Ph.D. dissertation, Massachusetts Institute of Technology. http://people.csail.mit.edu/meyer/Stockmeyer-thesis.pdf.Google Scholar
- Valiant, L. G. 1979. The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 3, 410--421.Google ScholarDigital Library
- Yannakakis, M. 1990. Graph-theoretic methods in database theory. In Proceedings of the Symposium on Principles of Database Systems (PODS'90). ACM Press, New York, 230--242. Google ScholarDigital Library
Index Terms
- The complexity of regular expressions and property paths in SPARQL
Recommendations
The complexity of evaluating path expressions in SPARQL
PODS '12: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database SystemsThe World Wide Web Consortium (W3C) recently introduced property paths in SPARQL 1.1, a query language for RDF data. Property paths allow SPARQL queries to evaluate regular expressions over graph data. However, they differ from standard regular ...
Construction of fuzzy automata from fuzzy regular expressions
Li and Pedrycz have proved fundamental results that provide different equivalent ways to represent fuzzy languages with membership values in a lattice-ordered monoid, and generalize the well-known results of the classical theory of formal languages. In ...
Regular Expressions for Languages over Infinite Alphabets
In this paper we introduce a notion of a regular expression over infinite alphabets and show that a language is definable by an infinite alphabet regular expression if and only if it is accepted by finite-state unification based automaton - a model of ...
Comments