research-article

Scalable join processing on very large RDF graphs

Authors:
Thomas Neumann

Max-Planck Institute for Informatics, Saarbrücken, Germany

Max-Planck Institute for Informatics, Saarbrücken, Germany
View Profile

,
Gerhard Weikum

Max-Planck Institute for Informatics, Saarbrücken, Germany

Max-Planck Institute for Informatics, Saarbrücken, Germany
View Profile

SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of dataJune 2009Pages 627–640https://doi.org/10.1145/1559845.1559911

Published:29 June 2009Publication History

SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

Pages 627–640

ABSTRACT

With the proliferation of the RDF data format, engines for RDF query processing are faced with very large graphs that contain hundreds of millions of RDF triples. This paper addresses the resulting scalability problems. Recent prior work along these lines has focused on indexing and other physical-design issues. The current paper focuses on join processing, as the fine-grained and schema-relaxed use of RDF often entails star- and chain-shaped join queries with many input streams from index scans.

We present two contributions for scalable join processing. First, we develop very light-weight methods for sideways information passing between separate joins at query run-time, to provide highly effective filters on the input streams of joins. Second, we improve previously proposed algorithms for join-order optimization by more accurate selectivity estimations for very large RDF graphs. Experimental studies with several RDF datasets, including the UniProt collection, demonstrate the performance gains of our approach, outperforming the previously fastest systems by more than an order of magnitude.

References

D. J. Abadi et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007. Google ScholarDigital Library
S. Auer et al. Dbpedia: A nucleus for a web of open data. In ISWC/ASWC, 2007. Google ScholarDigital Library
R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD, 2000. Google ScholarDigital Library
P. A. Bernstein and D.-M. W. Chiu. Using semi-joins to solve relational queries. J. ACM, 28(1):25--40, 1981. Google ScholarDigital Library
Bio2RDF, semantic web atlas of postgenomic knowledge about human and mouse. http://www.bio2rdf.org/.Google Scholar
U. Bojars et al. Interlinking the social web with semantics. IEEE Intelligent Systems, 23(3), 2008. Google ScholarDigital Library
J. Broekstra et al. Sesame: An architecture for storing and querying rdf data and schema information. In Spinning the Semantic Web, 2003. Google ScholarDigital Library
E. I. Chong et al. An efficient sql-based rdf querying scheme. In VLDB, 2005. Google ScholarDigital Library
Dbpedia 3.2 downloads. http://wiki.dbpedia.org/Downloads32.Google Scholar
D. DeHaan and F. W. Tompa. Optimal top-down join enumeration. In SIGMOD, 2007. Google ScholarDigital Library
A. Deshpande et al. Adaptive query processing. Foundations and Trends in Databases, 1(1), 2007. Google ScholarDigital Library
Open directory RDF dump. http://rdf.dmoz.org/.Google Scholar
Metaweb technologies: Freebase data dumps. http://download.freebase.com/datadumps/.Google Scholar
C. A. Galindo-Legaria, A. Pellenkoft, and M. L. Kersten. Fast, randomized join-order selection - why use transformations? In VLDB, 1994. Google ScholarDigital Library
G. Graefe. Query evaluation techniques for large databases. ACM Comput. Surv., 25(2):73--170, 1993. Google ScholarDigital Library
A. Y. Halevy, M. J. Franklin, and D. Maier. Principles of dataspace systems. In PODS, pages 1--9, 2006. Google ScholarDigital Library
A. Harth et al. Yars2: A federated repository for querying graph structured data from the web. In ISWC/ASWC, 2007. Google ScholarDigital Library
Z. G. Ives and N. E. Taylor. Sideways information passing for push-style query processing. In ICDE, 2008. Google ScholarDigital Library
Jena: a Semantic Web Framework for Java. http://jena.sourceforge.net/.Google Scholar
D. Kossmann. The state of the art in distributed query processing. ACM Comput. Surv., 32(4):422--469, 2000. Google ScholarDigital Library
A. Maduko et al. Estimating the cardinality of rdf graph patterns. In WWW, 2007. Google ScholarDigital Library
G. Moerkotte and T. Neumann. Analysis of two existing and one new dynamic programming algorithm for the generation of optimal bushy join trees without cross products. In VLDB, 2006. Google ScholarDigital Library
MonetDB. http://monetdb.cwi.nl/.Google Scholar
I. S. Mumick and H. Pirahesh. Implementation of magic-sets in a relational database system. In SIGMOD, 1994. Google ScholarDigital Library
T. Neumann and G. Weikum. Rdf-3x: a risc-style engine for rdf. PVLDB, 1(1):647--659, 2008. Google ScholarDigital Library
OpenRDF. http://www.openrdf.org/index.jsp.Google Scholar
PostgreSQL. http://www.postgresql.org/.Google Scholar
RDF-3X. http://www.mpi-inf.mpg.de/~neumann/rdf3x.Google Scholar
Semantic web challenge 2008. billion triples track. http://challenge.semanticweb.org/.Google Scholar
P. Seshadri et al. Cost-based optimization for magic: Algebra and implementation. In SIGMOD, 1996. Google ScholarDigital Library
L. Sidirourgos et al. Column-store support for rdf data management: not all swans are white. PVLDB, 1(2):1553--1563, 2008. Google ScholarDigital Library
M. Steinbrunn et al. Bypassing joins in disjunctive queries. In VLDB, 1995. Google ScholarDigital Library
K. Stocker et al. Integrating semi-join-reducers into state of the art query processors. In ICDE, 2001. Google ScholarDigital Library
M. Stocker et al. Sparql basic graph pattern optimization using selectivity estimation. In WWW, 2008. Google ScholarDigital Library
F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In WWW, 2007. Google ScholarDigital Library
O. Udrea, A. Pugliese, and V. S. Subrahmanian. Grin: A graph based rdf index. In AAAI, 2007. Google ScholarDigital Library
Uniprot RDF. http://dev.isb-sib.ch/projects/uniprot-rdf/.Google Scholar
C. Weiss, P. Karras, and A. Bernstein. Hexastore: sextuple indexing for semantic web data management. PVLDB, 1(1):1008--1019, 2008. Google ScholarDigital Library
K. Wilkinson et al. Efficient rdf storage and retrieval in jena2. In SWDB, 2003.Google Scholar
F. Wu and D. S. Weld. Automatically refining the wikipedia infobox ontology. In WWW, 2008. Google ScholarDigital Library

Index Terms

Scalable join processing on very large RDF graphs
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Reformulation-Based Query Answering for RDF Graphs with RDFS Ontologies
The Semantic Web
Abstract
Query answering in RDF knowledge bases has traditionally been performed either through graph saturation, i.e., adding all implicit triples to the graph, or through query reformulation, i.e., modifying the query to look for the explicit triples ...
Read More
The RDF-3X engine for scalable management of RDF data

RDF is a data model for schema-free structured information that is gaining momentum in the context of Semantic-Web data, life sciences, and also Web 2.0 platforms. The "pay-as-you-go" nature of RDF and the flexible pattern-matching capabilities of its ...
Read More
Combining Joint and Semi-Join Operations for Distributed Query Processing

The application of a combination of join and semi-join operations to minimize the amount of data transmission required for distributed query processing is discussed. Specifically, two important concepts that occur with the use of join operations as ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
June 2009
1168 pages
ISBN:9781605585512
DOI:10.1145/1559845
Editors:
Carsten Binnig,
Benoit Dageville,
General Chairs:
Uğur Çetintemel
Brown University, USA
,
Stan Zdonik
Brown University, USA
,
Program Chair:
Donald Kossmann
ETH Zurich, Switzerland
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Results Reproduced / v1.1
Author Tags
query processing
rdf
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 149
  Total Citations
  View Citations
- 2,496
  Total Downloads
- Downloads (Last 12 months)45
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scalable join processing on very large RDF graphs

SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Reformulation-Based Query Answering for RDF Graphs with RDFS Ontologies

The RDF-3X engine for scalable management of RDF data

Combining Joint and Semi-Join Operations for Distributed Query Processing