skip to main content
research-article

Querying Graphs with Data

Authors Info & Claims
Published:20 March 2016Publication History
Skip Abstract Section

Abstract

Graph databases have received much attention as of late due to numerous applications in which data is naturally viewed as a graph; these include social networks, RDF and the Semantic Web, biological databases, and many others. There are many proposals for query languages for graph databases that mainly fall into two categories. One views graphs as a particular kind of relational data and uses traditional relational mechanisms for querying. The other concentrates on querying the topology of the graph. These approaches, however, lack the ability to combine data and topology, which would allow queries asking how data changes along paths and patterns enveloping it.

In this article, we present a comprehensive study of languages that enable such combination of data and topology querying. These languages come in two flavors. The first follows the standard approach of path queries, which specify how labels of edges change along a path, but now we extend them with ways of specifying how both labels and data change. From the complexity point of view, the right type of formalisms are subclasses of register automata. These, however, are not well suited for querying. To overcome this, we develop several types of extended regular expressions to specify paths with data and study their querying power and complexity. The second approach adopts the popular XML language XPath and extends it from XML documents to graphs. Depending on the exact set of allowed features, we have a family of languages, and our study shows that it includes efficient and highly expressive formalisms for querying both the structure of the data and the data itself.

Skip Supplemental Material Section

Supplemental Material

References

  1. S. Abiteboul, R. Hull, and V. Vianu. 1995. Foundations of Databases. Addison-Wesley.Google ScholarGoogle Scholar
  2. S. Abiteboul, L. Segoufin, and V. Vianu. 2006. Representing and querying XML with incomplete information. ACM Trans. Database Syst. 31, 1 (2006), 208--254.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Abiteboul and V. Vianu. 1999. Regular path queries with constraints. J. Comput. Syst. Sci. 3, 58 (1999), 428--452.Google ScholarGoogle Scholar
  4. A. V. Aho. 1990. Handbook of Theoretical Computer Science. 255--300 pages.Google ScholarGoogle Scholar
  5. N. Alechina, S. Demri, and M. de Rijke. 2003. A modal perspective on path constraints. J. Log. Comput. 13, 6 (2003), 939--956.Google ScholarGoogle Scholar
  6. N. Alechina and N. Immerman. 2000. Reachability logic: An efficient fragment of transitive closure logic. Logic J. IGPL 8, 3 (2000), 325--337.Google ScholarGoogle ScholarCross RefCross Ref
  7. H. Andréka, I. Németi, and I. Sain. 2001. Algebraic logic. In Handbook of Philosophical Logic (2nd ed.). Vol. 2. Springer.Google ScholarGoogle Scholar
  8. R. Angles and C. Gutierrez. 2008. Survey of graph database models. Comput. Surv. 40, 1 (2008), 1:1--1:39.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Barceló. 2013. Querying graph databases. In 32th ACM Symposium on Principles of Database Systems (PODS’13).Google ScholarGoogle Scholar
  10. P. Barceló, D. Figueira, and L. Libkin. 2012a. Graph logics with rational relations and the generalized intersection problem. In 27th Annual IEEE Symposium on Logic in Computer Science (LICS’12).Google ScholarGoogle Scholar
  11. P. Barceló, L. Libkin, A. W. Lin, and P. T. Wood. 2012b. Expressive languages for path queries over graph-structured data. ACM TODS 38, 4 (2012), 31:1--31:46.Google ScholarGoogle Scholar
  12. P. Barceló, L. Libkin, A. Poggi, and C. Sirangelo. 2010. XML with incomplete information. J. ACM 58, 1 (2010), 1--62.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Barceló, L. Libkin, and J. L. Reutter. 2014. Querying regular graph patterns. J. ACM 61, 1 (2014), 8:1--8:54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Barceló, J. Pérez, and J. L. Reutter. 2012c. Relative expressiveness of nested regular expressions. In Proceedings of the 6th Alberto Mendelzon International Workshop on Foundations of Data Management (AMW). 180--195.Google ScholarGoogle Scholar
  15. P. Barceló, R. Pichler, and S. Skritek. 2015. Efficient evaluation and approximation of well-designed pattern trees. In ACM Symposium on Principles of Database Systems (PODS’15). 131--144.Google ScholarGoogle Scholar
  16. M. Benedikt, W. Fan, and F. Geerts. 2008. XPath satisfiability in the presence of DTDs. J. ACM 55, 2 (2008), 8:1--8:79.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Bojanczyk. 2010. Automata for data words and data trees. In Proceedings of the 21st International Conference on Rewriting Techniques and Applications (RTA).Google ScholarGoogle ScholarCross RefCross Ref
  18. M. Bojanczyk, C. David, A. Muscholl, T. Schwentick, and L. Segoufin. 2011. Two-variable logic on words with data. ACM TOCL 12, 4 (2011), 27:1--27:26.Google ScholarGoogle Scholar
  19. M. Bojanczyk, A. Muscholl, T. Schwentick, and L. Segoufin. 2009. Two-variable logic on data trees and XML reasoning. J. ACM 56, 3 (2009), 13:1--13:48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Bojanczyk and P. Parys. 2011. XPath evaluation in linear time. J. ACM 58, 4 (2011), 17:1--17:33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Bouyer, A. Petit, and D. Thérien. 2001. An algebraic characterization of data and timed languages. In International Conference on Concurrency Theory (CONCUR). 248--261.Google ScholarGoogle Scholar
  22. D. Calvanese, G. De Giacomo, M. Lenzerini, and M. Y. Vardi. 2000. Containment of conjunctive regular path queries with inverse. In 7th International Conference on Principles of Knowledge Representation and Reasoning (KR’00). 176--185.Google ScholarGoogle Scholar
  23. D. Calvanese, G. De Giacomo, M. Lenzerini, and M. Y. Vardi. 2009. An automata-theoretic approach to regular XPath. In 12th International Symposium on Database Programming Languages (DBLP). 18--35.Google ScholarGoogle Scholar
  24. S. Cassidy. 2003. Generalizing XPath for directed graphs. In Extreme Markup Languages.Google ScholarGoogle Scholar
  25. R. Cleaveland and B. Steffen. 1993. A linear-time model-checking algorithm for the alternation-free modal mu-calculus. Formal Methods Syst. Design 2, 2 (1993), 121--147.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Consens and A. O. Mendelzon. 1990. GraphLog: A visual formalism for real life recursion. In 9th ACM Symposium on Principles of Database Systems (PODS’90). 404--416.Google ScholarGoogle Scholar
  27. I. Cruz, A. O. Mendelzon, and P. Wood. 1987. A graphical query language supporting recursion. In ACM Special Interest Group on Management of Data 1987 Annual Conference (SIGMOD’87). 323--330.Google ScholarGoogle Scholar
  28. S. Demri and R. Lazić. 2009. LTL with the freeze quantifier and register automata. ACM TOCL 10, 3 (2009), 16:1--16:30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Demri, R. Lazić, and D. Nowak. 2007. On the freeze quantifier in constraint LTL: Decidability and complexity. Inf. Comput. 205, 1 (2007), 2--24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Deutsch and V. Tannen. 2001. Optimization properties for classes of conjunctive regular path queries. In 8th International Workshop on Database Programming Languages (DBPL’01). 21--39.Google ScholarGoogle Scholar
  31. Dex. 2013. DEX query language, Sparsity Technologies. Retrieved from http://www.sparsity-technologies.com/dex.php.Google ScholarGoogle Scholar
  32. Facebook. 2014. Graph Search. Retrieved from https://www.facebook.com/about/graphsearch.Google ScholarGoogle Scholar
  33. W. Fan. 2012. Graph pattern matching revised for social network analysis. In 15th International Conference on Database Theory (ICDT). 8--21.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. D. Figueira. 2009. Satisfiability of downward XPath with data equality tests. In 28th ACM Symposium on Principles of Database Systems (PODS’09). 197--206.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Figueira. 2010. Reasoning on Words and Trees with Data. Ph.D. Dissertation. ÉNS de Cachan.Google ScholarGoogle Scholar
  36. G. H. L. Fletcher, M. Gyssens, D. Leinders, J. Van den Bussche, D. Van Gucht, S. Vansummeren, and Y. Wu. 2011. Relative expressive power of navigational querying on graphs. In 14th International Conference on Database Theory (ICDT). 197--207.Google ScholarGoogle Scholar
  37. D. Florescu, A. Y. Levy, and D. Suciu. 1998. Query containment for conjunctive queries with regular expressions. In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 139--148.Google ScholarGoogle Scholar
  38. S. Fortune, J. Hopcroft, and J. Wyllie. 1980. The directed homeomorphism problem. Theor. Comput. Sci. 10 (1980), 111--121.Google ScholarGoogle ScholarCross RefCross Ref
  39. G. Gottlob, C. Koch, and R. Pichler. 2005. Efficient algorithms for processing XPath queries. ACM Trans. Database Syst. 30, 2 (2005), 444--491.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Gremlin 2013. Gremlin Language. Retrieved from https://github.com/tinkerpop/gremlin/wiki.Google ScholarGoogle Scholar
  41. C. Gutierrez, C. Hurtado, A. O. Mendelzon, and J. Pérez. 2011. Foundations of semantic web databases. J. Comput. System Sci. 77, 3 (2011), 520--541.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. Harris and A. Seaborne. 2013. SPARQL 1.1 Query Language. W3C Recommendation. http://www.w3.org/ TR/sparql11-query/. (March 2013).Google ScholarGoogle Scholar
  43. M. Kaminski and N. Francez. 1994. Finite memory automata. Theor. Comput. Sci. 134, 2 (1994), 329--363.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. Kaminski and T. Tan. 2006. Regular expressions for languages over infinite alphabets. Fundamenta Informaticae 69, 3 (2006), 301--318.Google ScholarGoogle Scholar
  45. M. Kaminski and T. Tan. 2008. Tree automata over infinite alphabets. In Pillars of Computer Science. 386--423.Google ScholarGoogle Scholar
  46. M. Kay. 2004. XPath 2.0 Programmer’s Reference. Wrox.Google ScholarGoogle Scholar
  47. E. V. Kostylev, J. L. Reutter, M. Romero, and D. Vrgoč. 2015b. SPARQL with property paths. In The Semantic Web - 14th International Semantic Web Conference, Proceedings, Part I (ISWC’15). 3--18.Google ScholarGoogle Scholar
  48. E. V. Kostylev, J. L. Reutter, and D. Vrgoč. 2015a. XPath for DL ontologies. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI). 1525--1531.Google ScholarGoogle Scholar
  49. G. M. Kuper, L. Libkin, and J. Paredaens (Eds.). 2000. Constraint Databases. Springer.Google ScholarGoogle Scholar
  50. M. Lange. 2006. Model checking propositional dynamic logic with all extras. J. Appl. Logic 4, 1 (2006), 39--49.Google ScholarGoogle Scholar
  51. L. Libkin. 2004. Elements of Finite Model Theory. Springer.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. L. Libkin. 2014. Certain answers as objects and knowledge. In Principles of Knowledge Representation and Reasoning (KR’14). 328--337.Google ScholarGoogle Scholar
  53. L. Libkin, W. Martens, and D. Vrgoč. 2013a. Querying graph databases with XPath. In Joint 2013 EDBT/ICDT Conferences (ICDT'13).Google ScholarGoogle Scholar
  54. L. Libkin, J. L. Reutter, and D. Vrgoč. 2013b. TriAL for RDF: Adapting graph query languages for RDF data. In Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS).Google ScholarGoogle Scholar
  55. L. Libkin and D. Vrgoč. 2012a. Regular expressions for data words. In Logic for Programming, Artificial Intelligence, and Reasoning (LPAR). 274--288.Google ScholarGoogle Scholar
  56. L. Libkin and D. Vrgoč. 2012b. Regular path queries on graphs with data. In 15th International Conference on Database Theory (ICDT). 74--85.Google ScholarGoogle Scholar
  57. K. Losemann and W. Martens. 2012. The complexity of evaluating path expressions in SPARQL. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). 101--112.Google ScholarGoogle Scholar
  58. M. Marx. 2003. XPath and modal logics of finite DAG’s. In International Conference on Automated Reasoning with Analytic Tableaux and Related Methods (TABLEAUX). 150--164.Google ScholarGoogle ScholarCross RefCross Ref
  59. M. Marx. 2005. Conditional XPath. ACM Trans. Database Syst. 30, 4 (2005), 929--959.Google ScholarGoogle Scholar
  60. Neo4j 2013. Neo4j, The Graph Database. Retrieved from http://www.neo4j.org/.Google ScholarGoogle Scholar
  61. F. Neven, Th. Schwentick, and V. Vianu. 2004. Finite state machines for strings over infinite alphabets. ACM Trans. Comput. Log. 5, 3 (2004), 403--435.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. J. Pérez, M. Arenas, and C. Gutierrez. 2009. Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34, 3 (2009), 16:1--16:45.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. J. Pérez, M. Arenas, and C. Gutierrez. 2010. nSPARQL: A navigational language for RDF. J. Web Semantics 8, 4 (2010), 255--270.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. J. L. Reutter. 2013a. Containment of Nested Regular Expressions. CoRR abs/1304.2637. (2013).Google ScholarGoogle Scholar
  65. J. L. Reutter. 2013b. Graph Patterns: Structure, Query Answering and Applications in Schema Mappings and Formal Language Theory. Ph.D. Dissertation. School of Informatics, University of Edinburgh.Google ScholarGoogle Scholar
  66. J. L. Reutter, A. Soto, and D. Vrgoč. 2015. Recursion in SPARQL. In The Semantic Web - 14th International Semantic Web Conference, Proceedings, Part I (ISWC’15). 19--35.Google ScholarGoogle Scholar
  67. R. Ronen and O. Shmueli. 2009. SoQL: A language for querying and creating data in social networks. In 25th International Conference on Data Engineering (ICDE’09). 1595--1602.Google ScholarGoogle Scholar
  68. H. Sakamoto and D. Ikeda. 2000. Intractability of decision problems for finite-memory automata. Theor. Comput. Sci. 231, 2 (2000), 297--308.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. M. San Martín and C. Gutierrez. 2009. Representing, querying and transforming social networks with RDF/SPARQL. In 6th European Semantic Web Conference (ESWC’09). 293--307.Google ScholarGoogle Scholar
  70. L. Segoufin. 2006. Automata and logics for words and trees over an infinite alphabet. In Computer Science Logic (CSL). 41--57.Google ScholarGoogle Scholar
  71. L. Segoufin. 2007. Static analysis of XML processing with data values. SIGMOD Record 36, 1 (2007), 31--38.Google ScholarGoogle Scholar
  72. A. Tarski and S. Givant. 1987. A Formalization of Set Theory Without Variables. AMS.Google ScholarGoogle Scholar
  73. B. ten Cate. 2006. The expressivity of XPath with transitive closure. In 25th ACM Symposium on Principles of Database Systems (PODS’06). 328--337.Google ScholarGoogle Scholar
  74. B. ten Cate and M. Marx. 2007. Navigational XPath: Calculus and algebra. Sigmod Record 36, 2 (2007), 19--26.Google ScholarGoogle Scholar
  75. D. Vrgoč. 2014. Querying Graphs with Data. Ph.D. Dissertation. School of Informatics, University of Edinburgh.Google ScholarGoogle Scholar
  76. P. T. Wood. 2012. Query languages for graph databases. Sigmod Record 41, 1 (2012), 50--60.Google ScholarGoogle Scholar

Index Terms

  1. Querying Graphs with Data

          Recommendations

          Reviews

          Jaroslav Pokorny

          Various approaches to querying so-called graphs with data are described very comprehensively in this paper. In principle, this category of graphs is called (labeled) property graphs in the graph database community. Concerning queries in such databases, two categories are considered: one views graphs as sets of relational data, and the other concentrates on querying the topology of the graph. The authors focus on querying that combines both of these approaches. In section 2, containing preliminaries, an important notion of path and various types of complexity of query evaluation are defined. In the forthcoming sections, the authors deal with two categories of languages: path languages and graph languages. Also, notions like regular path queries (RPQs), conjunctive RPQs (CRPQs), and nested regular expressions (NREs) are introduced. Section 3 discusses existing languages for paths, particularly register automata and related classes of expressions focused on data path queries. First, regular data path queries (RDPQs) are studied, followed by regular query expressions with memory (RQM), which use regular expressions with memory (REM). Another class of regular expressions for data paths permits testing for equality or inequality of data; this enables consideration of queries based on regular expressions with equality. The authors talk about regular queries with datasets (RQDs). The core section 4 shows how the XPath language can be adapted to work over graphs. XPath has many desirable properties, such as the ability to define patterns that cannot be captured by paths, close connections to first-order logic, and efficient evaluation algorithms. The authors show how to extend the language to operate over graph databases while still retaining these properties. Such a language is called GPath here, and it is discussed with different data tests permitted in its formulas. Based on two categories of path formulas, the regular graph XPath (GXPathreg) and core graph XPath (GXPathcore) languages are introduced. Some extensions of both languages with counters are also discussed. The next two subsections of section 4 focus on query evaluation and expressive power. The authors investigate the complexity of querying graph databases using variants of GXPath and its many dialects when compared to first-order logic. They provide a detailed analysis of expressiveness for navigational features of these dialects and prove that GXPathcore is a proper subset of GXPathreg. Another result says that GXPathreg is equivalent to a three-variable fragment of the transitive-closer first-order logic. Expressiveness of languages using datasets is also studied. The rest of section 4 contains a comparison of GXPath to path languages introduced in section 3, as well as to traditional navigational languages such as RPQs, CRPQs, and NREs. A hierarchy of the GXPath fragments with regard to their relative expressive power illustrates their mutual relationships. In section 5, the authors define conjunctive queries using languages from previous sections as their basic building blocks. Section 6 includes a summary of complexity results for classes of queries studied in the paper. Finally, section 6 offers a number of possible directions for future research. The consideration of incomplete information is very realistic, since it occurs very often in practice, particularly in querying graphs with data. The notions defined in the paper are specified in a usual denotational way, which offers the needed clarity and preciseness. This increases the readability of the paper. Without a doubt, the paper offers interesting, valuable, and useful material for those interested in graph query languages and the associated theory. Online Computing Reviews Service

          Access critical reviews of Computing literature here

          Become a reviewer for Computing Reviews.

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Journal of the ACM
            Journal of the ACM  Volume 63, Issue 2
            May 2016
            249 pages
            ISSN:0004-5411
            EISSN:1557-735X
            DOI:10.1145/2906142
            Issue’s Table of Contents

            Copyright © 2016 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 20 March 2016
            • Revised: 1 December 2015
            • Accepted: 1 December 2015
            • Received: 1 September 2014
            Published in jacm Volume 63, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader