Abstract
Graph data appears in a variety of application domains, and many uses of it, such as querying, matching, and transforming data, naturally result in incompletely specified graph data, that is, graph patterns. While queries need to be posed against such data, techniques for querying patterns are generally lacking, and properties of such queries are not well understood.
Our goal is to study the basics of querying graph patterns. The key features of patterns we consider here are node and label variables and edges specified by regular expressions. We provide a classification of patterns, and study standard graph queries on graph patterns. We give precise characterizations of both data and combined complexity for each class of patterns. If complexity is high, we do further analysis of features that lead to intractability, as well as lower-complexity restrictions. Since our patterns are based on regular expressions, query answering for them can be captured by a new automata model. These automata have two modes of acceptance: one captures queries returning nodes, and the other queries returning paths. We study properties of such automata, and the key computational tasks associated with them. Finally, we provide additional restrictions for tractability, and show that some intractable cases can be naturally cast as instances of constraint satisfaction problems.
Supplemental Material
Available for Download
The proof is given in an electronic appendix, available online in the ACM Digital Library.
- Abiteboul, S., Buneman, P., and Suciu, D. 1999. Data on the Web: From Relations to Semistructured Data and XML. Morgan-Kauffman. Google ScholarDigital Library
- Angles, R. and Gutierrez, C. 2008. Survey of graph database models. ACM Comput. Surv. 40, 1. Google ScholarDigital Library
- Arenas, M., Barceló, P., Libkin, L., and Murlak, F. 2010. Relational and XML Data Exchange. Morgan & Claypool. Google ScholarDigital Library
- Barceló, P., Hurtado, C., Libkin, L., and Wood, P. 2010a. Expressive languages for path queries over graph-structured data. In Proceedings of the 29th ACM Symposium on Principles of Database Systems (PODS). 3--14. Google ScholarDigital Library
- Barceló, P., Libkin, L., Poggi, A., and Sirangelo, C. 2010b. XML with incomplete information. ACM 58, 1, 1--62. Google ScholarDigital Library
- Barceló, P., Libkin, L., and Reutter, J. 2013. Parameterized regular expressions and their languages. Theoret. Comput. Sci. 474, 21--45. Google ScholarDigital Library
- Björklund, H., Martens, W., and Schwentick, T. 2007. Conjunctive query containment over trees. In Proceeding of the 11th International Symposium on Database Programming Languages (DBPL). 66--80. Google ScholarDigital Library
- Bonatti, P. A., Lutz, C., Murano, A., and Vardi, M. Y. 2008. The complexity of enriched mu-calculi. Log. Meth. Comput. Sci. 8, 4.Google Scholar
- Börger, E., Gräedel, E., and Gurevich, Y. 1997. The Classical Decision Problem. Perspectives in Mathematical Logics, Springer-Verlag.Google Scholar
- Buneman, P., Davidson, S. B., Hillebrand, G. G., and Suciu, D. 1996. A query language and optimization techniques for unstructured data. In Proceedings of the SIGMOD Conference. 505--516. Google ScholarDigital Library
- Calvanese, D., De Giacomo, G., Lenzerini, M., and Vardi, M. 2000a. Answering regular path queries using views. In Proceedings of the 16th International Conference on Data Engineering (ICDE). 389--398. Google ScholarDigital Library
- Calvanese, D., De Giacomo, G., Lenzerini, M., and Vardi, M. 2000b. Containment of conjunctive regular path queries with inverse. In Proceedings of the 7th International Conference on Principles of Knowledge Representation and Reasoning (KR). 176--185.Google Scholar
- Calvanese, D., De Giacomo, G., Lenzerini, M., and Vardi, M. 2000c. View-based query processing and constraint satisfaction. In Proceedings of the 15th Annual IEEE Symposium on Logic in Computer Science (LICS). 361--371. Google ScholarDigital Library
- Calvanese, D., De Giacomo, G., Lenzerini, M., and Vardi, M. 2002. Rewriting of regular expressions and regular path queries. J. Comput. Syst. Sci. 64, 3, 443--465. Google ScholarDigital Library
- Calvanese, D., De Giacomo, G., Lenzerini, M., and Vardi, M. 2011. Simplifying schema mappings. In Proceedings of the 14th International Conference on Database Theory (ICDT). 114--125. Google ScholarDigital Library
- Cheng, J., Yu, J. X., Ding, B., Yu, P. S., and Wang, H. 2008. Fast graph pattern matching. In Proceedings of the 24th International Conference on Data Engineering (ICDE). 913--922. Google ScholarDigital Library
- Cohen, S. and Sagiv, Y. 2005. An abstract framework for generating maximal answers to queries. In Proceedings of the 10th International Conference on Database Theory (ICDT). 129--143. Google ScholarDigital Library
- Consens, M. and Mendelzon, A. 1990. Graphlog: A visual formalism for real life recursion. In Proceedings of the 9th ACM Symposium on Principles of Database Systems (PODS). 404--416. Google ScholarDigital Library
- Cruz, I., Mendelzon, A., and Wood, P. 1987. A graphical query language supporting recursion. In Proceedings of the ACM Special Interest Group on Management of Data 1987 Annual Conference (SIGMOD). 323--330. Google ScholarDigital Library
- De Giacomo, G. and Lenzerini, M. 1997. A uniform framework for concept definitions in description logics. J. Artif. Intell. Res. (JAIR) 6, 87--110. Google ScholarDigital Library
- Dechter, R. 2003. Constraint Processing. Morgan-Kauffman. Google ScholarDigital Library
- Deutsch, A. and Tannen, V. 2001. Optimization properties for classes of conjunctive regular path queries. In Proceedings of the 8th International Workshop on Database Programming Languages (DBPL). 21--39. Google ScholarDigital Library
- Diestel, R. 2005. Graph Theory. Springer.Google Scholar
- Fagin, R., Kolaitis, P., Miller, R., and Popa, L. 2005. Data exchange: Semantics and query answering. Theoret. Comput. Sci. 336, 1, 89--124. Google ScholarDigital Library
- Fan, W., Li, J., Ma, S., Tang, N., and Wu, Y. 2010a. Graph pattern matching: From intractable to polynomial time. In Proc. VLDB Endow. 3, 1, 264--275. Google ScholarDigital Library
- Fan, W., Li, J., Ma, S., Wang, H., and Wu, Y. 2010b. Homomorphism revisited for graph matching. In Proc. VLDB Endow. 3, 1, 1161--1172. Google ScholarDigital Library
- Fan, W., Li, J., Ma, S., Tang, N., and Wu, Y. 2011. Adding regular expressions to graph reachability and pattern queries. In Proceedings of the 27th International Conference on Data Engineering (ICDE). 39--50. Google ScholarDigital Library
- Glaister, I. and Shallit, J. 1996. A lower bound technique for the size of nondeterministic finite automata. Inf. Process. Lett. 59, 2, 75--77. Google ScholarDigital Library
- Gottlob, G., Koch, C., and Schulz, K. 2006. Conjunctive queries over trees. J. ACM 53, 2, 238--272. Google ScholarDigital Library
- Gutierrez, C., Hurtado, C., Mendelzon, A. O., and Pérez, J. 2011. Foundations of semantic web databases. J. Comput. Syst. Sci. 77, 3, 520--541. Google ScholarDigital Library
- Gyssens, M., Paredaens, J., Van den Bussche, J., and Van Gucht, D. 1994. A graph-oriented object database model. IEEE Trans. Knowl. Data Eng. 6, 4, 572--586. Google ScholarDigital Library
- Imielinski, T. and Lipski, W. 1984. Incomplete information in relational databases. J. ACM 31, 4, 761--791. Google ScholarDigital Library
- Johnson, D. and Klug, A. 1984. Testing containment of conjunctive queries under functional and inclusion dependencies. J. Comput Syst. Sci. 28, 1, 167--189.Google ScholarCross Ref
- Kanza, Y., Nutt, W., and Sagiv, Y. 2002. Querying incomplete information in semistructured data. J. Comput. Syst. Sci. 64, 3, 655--693. Google ScholarDigital Library
- Kolaitis, P. and Vardi, M. 2007. A logical approach to constraint satisfaction. In Finite Model Theory and Its Applications, Springer, 339--370.Google Scholar
- Kozen, D. 1977. Lower bounds for natural proof systems. In Proceeding of the 18th Annual Symposium on Foundations of Computer Science (FOCS). 254--266. Google ScholarDigital Library
- Kupferman, O., Vardi, M. Y., and Wolper, P. 2001. Module checking. Inf. Computat. 164, 2, 322--344. Google ScholarDigital Library
- Lakshmanan, L., Ramesh, G., Wang, W. H., and Zhao, Z. 2004. On testing satisfiability of tree pattern queries. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB). 120--131. Google ScholarDigital Library
- Lenzerini, M. 2002. Data integration: A theoretical perspective. In Proceedings of the 21st ACM Symposium on Principles of Database Systems (PODS). 233--246. Google ScholarDigital Library
- Leser, U. 2005. A query language for biological networks. Bioinformatics 21, 2, ii33--ii39. Google ScholarDigital Library
- Libkin, L. 2004. Elements of Finite Model Theory. Springer. Google ScholarDigital Library
- Libkin, L. 2011. Incomplete information and certain answers in general data models. In Proceedings of the 30th ACM Symposium on Principles of Database Systems (PODS). 59--70. Google ScholarDigital Library
- Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., and Alon, U. 2002. Network motifs: Simple building blocks of complex networks. Science 298, 5594, 824--827.Google Scholar
- Natarajan, M. 2000. Understanding the structure of a drug trafficking organization: A conversational analysis. Crime Prevention Studies 11, 273--298.Google Scholar
- Olken, F. 2003. Graph data management for molecular biology. OMICS: A Journal of Integrative Biology 7, 1, 75--78.Google ScholarCross Ref
- Pérez, J., Arenas, M., and Gutierrez, C. 2009. Semantics and complexity of SPARQL. ACM Trans. Datab. Syst. 34, 3. Google ScholarDigital Library
- Ronen, R. and Shmueli, O. 2009. Soql: A language for querying and creating data in social networks. In Proceedings of the 25th International Conference on Data Engineering (ICDE). 1595--1602. Google ScholarDigital Library
- San Martín, M. and Gutierrez, C. 2009. Representing, querying and transforming social networks with RDF/SPARQL. In Proceedings of the 6th European Semantic Web Conference (ESWC). 293--307. Google ScholarDigital Library
- Tong, H., Faloutsos, C., Gallagher, B., and Eliassi-Rad, T. 2007. Fast best-effort pattern matching in large attributed graphs. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 737--746. Google ScholarDigital Library
- Weikum, G., Kasneci, G., Ramanath, M., and Suchanek, F. 2009. Database and information-retrieval methods for knowledge discovery. Commun. ACM 52, 4, 56--64. Google ScholarDigital Library
Index Terms
- Querying Regular Graph Patterns
Recommendations
Foundations of Modern Query Languages for Graph Databases
We survey foundational features underlying modern graph query languages. We first discuss two popular graph data models: edge-labelled graphs, where nodes are connected by directed, labelled edges, and property graphs, where nodes and edges can further ...
Querying graph patterns
PODS '11: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsGraph data appears in a variety of application domains, and many uses of it, such as querying, matching, and transforming data, naturally result in incompletely specified graph data, i.e., graph patterns. While queries need to be posed against such data,...
Regular Queries on Graph Databases
Graph databases are currently one of the most popular paradigms for storing data. One of the key conceptual differences between graph and relational databases is the focus on navigational queries that ask whether some nodes are connected by paths ...
Comments