skip to main content
10.1145/1559845.1559911acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Results Reproduced / v1.1

Scalable join processing on very large RDF graphs

Authors Info & Claims
Published:29 June 2009Publication History

ABSTRACT

With the proliferation of the RDF data format, engines for RDF query processing are faced with very large graphs that contain hundreds of millions of RDF triples. This paper addresses the resulting scalability problems. Recent prior work along these lines has focused on indexing and other physical-design issues. The current paper focuses on join processing, as the fine-grained and schema-relaxed use of RDF often entails star- and chain-shaped join queries with many input streams from index scans.

We present two contributions for scalable join processing. First, we develop very light-weight methods for sideways information passing between separate joins at query run-time, to provide highly effective filters on the input streams of joins. Second, we improve previously proposed algorithms for join-order optimization by more accurate selectivity estimations for very large RDF graphs. Experimental studies with several RDF datasets, including the UniProt collection, demonstrate the performance gains of our approach, outperforming the previously fastest systems by more than an order of magnitude.

References

  1. D. J. Abadi et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Auer et al. Dbpedia: A nucleus for a web of open data. In ISWC/ASWC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. A. Bernstein and D.-M. W. Chiu. Using semi-joins to solve relational queries. J. ACM, 28(1):25--40, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bio2RDF, semantic web atlas of postgenomic knowledge about human and mouse. http://www.bio2rdf.org/.Google ScholarGoogle Scholar
  6. U. Bojars et al. Interlinking the social web with semantics. IEEE Intelligent Systems, 23(3), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Broekstra et al. Sesame: An architecture for storing and querying rdf data and schema information. In Spinning the Semantic Web, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. I. Chong et al. An efficient sql-based rdf querying scheme. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dbpedia 3.2 downloads. http://wiki.dbpedia.org/Downloads32.Google ScholarGoogle Scholar
  10. D. DeHaan and F. W. Tompa. Optimal top-down join enumeration. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Deshpande et al. Adaptive query processing. Foundations and Trends in Databases, 1(1), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Open directory RDF dump. http://rdf.dmoz.org/.Google ScholarGoogle Scholar
  13. Metaweb technologies: Freebase data dumps. http://download.freebase.com/datadumps/.Google ScholarGoogle Scholar
  14. C. A. Galindo-Legaria, A. Pellenkoft, and M. L. Kersten. Fast, randomized join-order selection - why use transformations? In VLDB, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Graefe. Query evaluation techniques for large databases. ACM Comput. Surv., 25(2):73--170, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Y. Halevy, M. J. Franklin, and D. Maier. Principles of dataspace systems. In PODS, pages 1--9, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Harth et al. Yars2: A federated repository for querying graph structured data from the web. In ISWC/ASWC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Z. G. Ives and N. E. Taylor. Sideways information passing for push-style query processing. In ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jena: a Semantic Web Framework for Java. http://jena.sourceforge.net/.Google ScholarGoogle Scholar
  20. D. Kossmann. The state of the art in distributed query processing. ACM Comput. Surv., 32(4):422--469, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Maduko et al. Estimating the cardinality of rdf graph patterns. In WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Moerkotte and T. Neumann. Analysis of two existing and one new dynamic programming algorithm for the generation of optimal bushy join trees without cross products. In VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. MonetDB. http://monetdb.cwi.nl/.Google ScholarGoogle Scholar
  24. I. S. Mumick and H. Pirahesh. Implementation of magic-sets in a relational database system. In SIGMOD, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Neumann and G. Weikum. Rdf-3x: a risc-style engine for rdf. PVLDB, 1(1):647--659, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. OpenRDF. http://www.openrdf.org/index.jsp.Google ScholarGoogle Scholar
  27. PostgreSQL. http://www.postgresql.org/.Google ScholarGoogle Scholar
  28. RDF-3X. http://www.mpi-inf.mpg.de/~neumann/rdf3x.Google ScholarGoogle Scholar
  29. Semantic web challenge 2008. billion triples track. http://challenge.semanticweb.org/.Google ScholarGoogle Scholar
  30. P. Seshadri et al. Cost-based optimization for magic: Algebra and implementation. In SIGMOD, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Sidirourgos et al. Column-store support for rdf data management: not all swans are white. PVLDB, 1(2):1553--1563, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Steinbrunn et al. Bypassing joins in disjunctive queries. In VLDB, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Stocker et al. Integrating semi-join-reducers into state of the art query processors. In ICDE, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. Stocker et al. Sparql basic graph pattern optimization using selectivity estimation. In WWW, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. O. Udrea, A. Pugliese, and V. S. Subrahmanian. Grin: A graph based rdf index. In AAAI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Uniprot RDF. http://dev.isb-sib.ch/projects/uniprot-rdf/.Google ScholarGoogle Scholar
  38. C. Weiss, P. Karras, and A. Bernstein. Hexastore: sextuple indexing for semantic web data management. PVLDB, 1(1):1008--1019, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. K. Wilkinson et al. Efficient rdf storage and retrieval in jena2. In SWDB, 2003.Google ScholarGoogle Scholar
  40. F. Wu and D. S. Weld. Automatically refining the wikipedia infobox ontology. In WWW, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scalable join processing on very large RDF graphs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
        June 2009
        1168 pages
        ISBN:9781605585512
        DOI:10.1145/1559845

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 29 June 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader