survey

RDF Data Storage and Query Processing Schemes: A Survey

Authors:
Marcin Wylot

TU Berlin / Fraunhofer FOKUS, Germany

TU Berlin / Fraunhofer FOKUS, Germany
View Profile

,
Manfred Hauswirth

TU Berlin / Fraunhofer FOKUS, Germany

TU Berlin / Fraunhofer FOKUS, Germany
View Profile

,
Philippe Cudré-Mauroux

University of Fribourg, Switzerland

University of Fribourg, Switzerland
View Profile

,
Sherif Sakr

University of Tartu, Estonia King Saud bin Abdulaziz University for Health Sciences, Saudi Arabia

University of Tartu, Estonia King Saud bin Abdulaziz University for Health Sciences, Saudi Arabia

0000-0002-2503-523X
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 51 Issue 4Article No.: 84pp 1–36https://doi.org/10.1145/3177850

Published:06 September 2018Publication History

ACM Computing Surveys

Abstract

The Resource Description Framework (RDF) represents a main ingredient and data representation format for Linked Data and the Semantic Web. It supports a generic graph-based data model and data representation format for describing things, including their relationships with other things. As the size of RDF datasets is growing fast, RDF data management systems must be able to cope with growing amounts of data. Even though physically handling RDF data using a relational table is possible, querying a giant triple table becomes very expensive because of the multiple nested joins required for answering graph queries. In addition, the heterogeneity of RDF Data poses entirely new challenges to database systems. This article provides a comprehensive study of the state of the art in handling and querying RDF data. In particular, we focus on data storage techniques, indexing strategies, and query execution mechanisms. Moreover, we provide a classification of existing systems and approaches. We also provide an overview of the various benchmarking efforts in this context and discuss some of the open problems in this domain.

Supplemental Material

Available for Download

zip

wylot.zip (40.3 KB)

Supplemental movie, appendix, image and software files for, RDF Data Storage and Query Processing Schemes: A Survey

References

Daniel J. Abadi, Adam Marcus, Samuel R. Madden, and Kate Hollenbach. 2007. Scalable semantic web data management using vertical partitioning. In Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, 411--422. Google ScholarDigital Library
Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel J. Abadi, Alexander Rasin, and Avi Silberschatz. 2009. HadoopDB: An architectural hybrid of mapreduce and DBMS technologies for analytical workloads. Proc. VLDB 2, 1 (2009), 922--933. Retrieved from http://www.vldb.org/pvldb/2/vldb09-861.pdf. Google ScholarDigital Library
Maribel Acosta, Maria-Esther Vidal, Tomas Lampo, Julio Castillo, and Edna Ruckhaus. 2011. ANAPSID: An adaptive query processing engine for SPARQL endpoints. Semant. Web (2011), 18--34. Google ScholarDigital Library
Razen Al-Harbi, Ibrahim Abdelaziz, Panos Kalnis, Nikos Mamoulis, Yasser Ebrahim, and Majed Sahli. 2016. Accelerating SPARQL queries by exploiting hash-based locality and adaptive partitioning. VLDB J. 25, 3 (2016), 355--380. Google ScholarDigital Library
Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, and Dimitris Plexousakis. 2001. On storing voluminous RDF descriptions: The case of web portal catalogs. In Proceedings of the International Workshop on the Web and Databases (WebDB’01). 43--48.Google Scholar
Keith Alexander and Michael Hausenblas. 2009. Describing linked datasets—On the design and usage of void, the vocabulary of interlinked datasets. In Proceedings of the Linked Data on the Web Workshop (LDOW’09). Retrieved from http://richard.cyganiak.de/2008/papers/void-ldow2009.pdf.Google Scholar
Güneş Aluç, Olaf Hartig, M. Tamer Özsu, and Khuzaima Daudjee. 2014a. Diversified stress testing of RDF data management systems. In Proceedings of the International Semantic Web Conference. Springer, 197--212. Google ScholarDigital Library
Güneş Aluç, M. Tamer Özsu, and Khuzaima Daudjee. 2014b. Workload matters: Why RDF databases need a new design. Proc. VLDB Endow. 7, 10 (2014), 837--840. Google ScholarDigital Library
Güneş Aluç, M. Tamer Ozsu, Khuzaima Daudjee, and Olaf Hartig. 2013. Chameleon-db: A Workload-Aware Robust RDF Data Management System. Technical Report CS-2013-10. University of Waterloo.Google Scholar
Andrés Aranda-Andújar, Francesca Bugiotti, Jesús Camacho-Rodríguez, Dario Colazzo, François Goasdoué, Zoi Kaoudi, and Ioana Manolescu. 2012. AMADA: Web data repositories in the amazon cloud. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). 2749--2751. Google ScholarDigital Library
Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. 2015. Spark SQL: Relational data processing in spark. In Proceedings of the ACM International Conference on Management of Data (SIGMOD’15). 1383--1394. Google ScholarDigital Library
Medha Atre and James A. Hendler. 2009. BitMat: A main memory bit-matrix of RDF triples. In Proceedings of the 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS’09). Citeseer, 33.Google Scholar
Medha Atre, Jagannathan Srinivasan, and James A. Hendler. 2008. BitMat: A main-memory bit matrix of RDF triples for conjunctive triple pattern queries. In Proceedings of the Poster and Demonstration Session at the 7th International Semantic Web Conference (ISWC’08). Retrieved from http://ceur-ws.org/Vol-401/iswc2008pd_submission_16.pdf. Google ScholarDigital Library
Anirudh Badam and Vivek S. Pai. 2011. SSDAlloc: Hybrid SSD/RAM memory management made easy. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, 16--16. Google ScholarDigital Library
Tim Berners-Lee, James Hendler, Ora Lassila et al. 2001. The semantic web. Sci. Amer. 284, 5 (2001), 28--37.Google Scholar
Philip A. Bernstein and Dah-Ming W. Chiu. 1981. Using semi-joins to solve relational queries. J. ACM 28, 1 (1981), 25--40. Google ScholarDigital Library
Christian Bizer, Tom Heath, and Tim Berners-Lee. 2009. Linked data-the story so far. https://eprints.soton.ac.uk/271285/.Google Scholar
Christian Bizer and Andreas Schultz. 2009. The Berlin SPARQL benchmark. Int. J. Semantic Web Inf. Syst. 5, 2 (2009), 1--24.Google ScholarCross Ref
Mihaela A. Bornea, Julian Dolby, Anastasios Kementsietsidis, Kavitha Srinivas, Patrick Dantressangle, Octavian Udrea, and Bishwaranjan Bhattacharjee. 2013. Building an efficient RDF store over a relational database. In Proceedings of the 2013 International Conference on Management of Data. ACM, 121--132. Google ScholarDigital Library
Jeen Broekstra, Arjohn Kampman, and Frank van Harmelen. 2002. Sesame: A generic architecture for storing and querying RDF and RDF schema. In Proceedings of the 1st International Semantic Web Conference on the Semantic Web (ISWC’02). Springer, 54--68. Google ScholarDigital Library
Rick Cattell. 2011. Scalable SQL and NoSQL data stores. ACM SIGMOD Rec. 39, 4 (2011), 12--27. Google ScholarDigital Library
Surajit Chaudhuri and Gerhard Weikum. 2000. Rethinking database system architecture: Toward a self-tuning RISC-style database system. In Proceedings of 26th International Conference on Very Large Data Bases (VLDB’00). 1--10. Google ScholarDigital Library
Xi Chen, Huajun Chen, Ningyu Zhang, and Songyang Zhang. 2014. SparkRDF: Elastic discreted RDF graph processing engine with distributed memory. In Proceedings of the Posters 8 Demonstrations Track a Track Within the 13th International Semantic Web Conference (ISWC’14). 261--264. Retrieved from http://ceur-ws.org/Vol-1272/paper_43.pdf. Google ScholarDigital Library
Xi Chen, Huajun Chen, Ningyu Zhang, and Songyang Zhang. 2015. SparkRDF: Elastic discreted RDF graph processing engine with distributed memory. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT’15). 292--300. Google ScholarDigital Library
Long Cheng and Spyros Kotoulas. 2015. Scale-out processing of large RDF datasets. IEEE Trans. Big Data 1, 4 (2015), 138--150.Google ScholarCross Ref
Eugene Inseok Chong, Souripriya Das, George Eadon, and Jagannathan Srinivasan. 2005. An efficient SQL-based RDF querying scheme. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB’05). VLDB Endowment, 1216--1227. Retrieved from http://portal.acm.org/citation.cfm?id=1083592.1083734. Google ScholarDigital Library
World Wide Web Consortium. 2014a. RDF 1.1: On Semantics of RDF Datasets. https://www.w3.org/TR/rdf11-datasets/.Google Scholar
World Wide Web Consortium. 2014b. RDF 1.1 Primer.Google Scholar
George P. Copeland and Setrag Khoshafian. 1985. A decomposition storage model. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 268--279. Google ScholarDigital Library
Philippe Cudré-Mauroux, Iliya Enchev, Sever Fundatureanu, Paul Groth, Albert Haque, Andreas Harth, Felix Leif Keppmann, Daniel Miranker, Juan F Sequeda, and Marcin Wylot. 2013. Nosql databases for rdf: An empirical evaluation. In Proceedings of the International Semantic Web Conference. Springer, 310--325. Google ScholarDigital Library
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51 (Jan. 2008), 107--113. Issue 1. Google ScholarDigital Library
Gianluca Demartini, Iliya Enchev, Marcin Wylot, Joel Gapany, and Philippe Cudre-Mauroux. 2012. BowlognaBench—Benchmarking RDF analytics. In Data-Driven Process Discovery and Analysis, Karl Aberer, Ernesto Damiani, and Tharam Dillon (Eds.). Lecture Notes in Business Information Processing, Vol. 116. Springer, Berlin, 82--102.Google Scholar
Uwe Deppisch. 1986. S-tree: A dynamic balanced signature index for office retrieval. In Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 77--87. Google ScholarDigital Library
Amol Deshpande, Zachary Ives, Vijayshankar Raman et al. 2007. Adaptive query processing. Foundations and Trends in Databases 1, 1 (2007), 1--140. Google ScholarDigital Library
Benjamin Djahandideh, François Goasdoué, Zoi Kaoudi, Ioana Manolescu, Jorge-Arnulfo Quiané-Ruiz, and Stamatis Zampetakis. 2015. CliqueSquare in action: Flat plans for massively parallel RDF queries. In Proceedings of the 31st IEEE International Conference on Data Engineering (ICDE’15). 1432--1435.Google ScholarCross Ref
Orri Erling and Ivan Mikhailov. 2008. Towards web scale RDF. Proc. SSWS (2008). https://www.csee.umbc.edu/courses/graduate/691/spring13/01/papers/VOSArticleWebScaleRDF.pdf.Google Scholar
Dieter Fensel. 2003. Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce. Springer Science 8 Business Media. Google ScholarDigital Library
Luis Galárraga, Katja Hose, and Ralf Schenkel. 2014. Partout: A distributed engine for efficient RDF processing. In 23rd International World Wide Web Conference (WWW’14). 267--268. Google ScholarDigital Library
José M. Giménez-García, Javier D. Fernández, and Miguel A. Martínez-Prieto. 2015. HDT-MR: A scalable solution for RDF compression with HDT and MapReduce. In Proceedings of the European Semantic Web Conference. Springer, 253--268. Google ScholarDigital Library
François Goasdoué, Zoi Kaoudi, Ioana Manolescu, Jorge-Arnulfo Quiané-Ruiz, and Stamatis Zampetakis. 2015. CliqueSquare: Flat plans for massively parallel RDF queries. In Proceedings of the 31st IEEE International Conference on Data Engineering (ICDE’15). 771--782.Google ScholarCross Ref
Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). 599--613. Retrieved from https://www.usenix.org/conference/osdi14/technical-sessions/presentation/gonzalez. Google ScholarDigital Library
Eric L. Goodman and Dirk Grunwald. 2014. Using vertex-centric programming platforms to implement SPARQL queries on large graphs. In Proceedings of the 4th Workshop on Irregular Applications: Architectures and Algorithms (IA3’14). IEEE Press, Piscataway, NJ, 25--32. Google ScholarDigital Library
Olaf Görlitz and Steffen Staab. 2011. Splendid: Sparql endpoint federation exploiting void descriptions. In Proceedings of the 2nd International Conference on Consuming Linked Data. CEUR-WS.org, 13--24. Google ScholarDigital Library
Yuanbo Guo, Zhengxiang Pan, and Jeff Heflin. 2005. LUBM: A benchmark for OWL knowledge base systems. Web Semant. 3 (Oct. 2005), 158--182. Issue 2--3. Google ScholarDigital Library
Sairam Gurajada, Stephan Seufert, Iris Miliaraki, and Martin Theobald. 2014. TriAD: A distributed shared-nothing RDF engine based on asynchronous message passing. In Proceedings of the International Conference on Management of Data (SIGMOD’14). 289--300. Google ScholarDigital Library
Laura Haas, Donald Kossmann, Edward Wimmers, and Jun Yang. 1997. Optimizing queries across diverse data sources. VLDB. 276--285. http://www.vldb.org/conf/1997/P276.PDF. Google ScholarDigital Library
Peter Haase, Katja Hose, Ralf Schenkel, Michael Schmidt, and Andreas Schwarte. 2014. Federated query processing over linked data. In Linked Data Management. 369--387. Retrieved fromGoogle Scholar
Peter Haase, Tobias Mathäß, and Michael Ziller. 2010. An evaluation of approaches to federated query processing over linked data. In Proceedings of the 6th International Conference on Semantic Systems. ACM, 5. Google ScholarDigital Library
Mohammad Hammoud, Dania Abed Rabbou, Reza Nouri, Seyed-Mehdi-Reza Beheshti, and Sherif Sakr. 2015. DREAM: Distributed RDF engine with adaptive query planner and minimal communication. Proc. VLDB 8, 6 (2015), 654--665. Retrieved from http://www.vldb.org/pvldb/vol8/p654-Hammoud.pdf. Google ScholarDigital Library
Razen Harbi, Ibrahim Abdelaziz, Panos Kalnis, and Nikos Mamoulis. 2015. Evaluating SPARQL queries on massive RDF datasets. Proc. VLDB 8, 12 (2015), 1848--1851. Retrieved from http://www.vldb.org/pvldb/vol8/p1848-harbi.pdf. Google ScholarDigital Library
Stephen Harris and Nicholas Gibbins. 2003. 3store: Efficient bulk RDF storage. In Proceedings of the 1st International Workshop on Practical and Scalable Semantic Systems (PSSS’03). CEUR-WS.org.Google Scholar
Steve Harris, Nick Lamb, and Nigel Shadbolt. 2009. 4store: The design and implementation of a clustered RDF store. In Proceedings of the 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS’09). 94--109.Google Scholar
Andreas Harth and Stefan Decker. 2005. Optimized index structures for querying RDF from the web. In Proceedings of the IEEE Latin American Web Congress (LA-WEB’05). 71--80. Google ScholarDigital Library
Aisha Hasan, Mohammad Hammoud, Reza Nouri, and Sherif Sakr. 2016. DREAM in action: A distributed and adaptive RDF system on the cloud. In Proceedings of the 25th International Conference on World Wide Web (WWW’16). 191--194. Google ScholarDigital Library
Jiewen Huang, Daniel J. Abadi, and Kun Ren. 2011. Scalable SPARQL querying of large RDF graphs. Proc. VLDB 4, 11 (2011), 1123--1134.Google ScholarDigital Library
Mohammad Husain, James McGlothlin, Mohammad M. Masud, Latifur Khan, and Bhavani M. Thuraisingham. 2011. Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Trans. Knowl. Data Eng. 23, 9 (2011), 1312--1327. Google ScholarDigital Library
Vijay Ingalalli, Dino Ienco, Pascal Poncelet, and Serena Villata. 2016. Querying RDF data using a multigraph-based approach. In Proceedings of the 19th International Conference on Extending Database Technology (EDBT’16). 245--256.Google Scholar
Zoi Kaoudi and Ioana Manolescu. 2015. RDF in the clouds: A survey. VLDB J. 24, 1 (2015), 67--91. Google ScholarDigital Library
Vaibhav Khadilkar, Murat Kantarcioglu, Bhavani M. Thuraisingham, and Paolo Castagna. 2012. Jena-HBase: A distributed, scalable and effcient RDF triple store. In Proceedings of the ISWC 2012 Posters 8 Demonstrations Track. Retrieved from http://ceur-ws.org/Vol-914/paper_14.pdf. Google ScholarDigital Library
HyeongSik Kim, Padmashree Ravindra, and Kemafor Anyanwu. 2013. Optimizing RDF(S) queries on cloud platforms. In Proceedings of the 22nd International World Wide Web Conference (WWW’13). 261--264. Retrieved from http://dl.acm.org/citation.cfm?id=2487917. Google ScholarDigital Library
Jinha Kim, Hyungyu Shin, Wook-Shin Han, Sungpack Hong, and Hassan Chafi. 2015. Taming subgraph isomorphism for RDF query processing. Proc. VLDB 8, 11 (2015), 1238--1249. Retrieved from http://www.vldb.org/pvldb/vol8/p1238-kim.pdf. Google ScholarDigital Library
Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. Graphchi: Large-scale graph computation on just a pc. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI’12), Vol. 8. 31--46. Google ScholarDigital Library
Günter Ladwig and Andreas Harth. 2011. CumulusRDF: Linked data management on nested key-value stores. In Proceedings of the 7th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS’11). 30.Google Scholar
Avinash Lakshman and Prashant Malik. 2010. Cassandra: A decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44, 2 (April 2010), 35--40. Google ScholarDigital Library
Kisung Lee and Ling Liu. 2013. Scaling queries over big RDF graphs with semantic hash partitioning. Proc. VLDB Endow. 6, 14 (2013), 1894--1905. Google ScholarDigital Library
Baolin Liu and Bo Hu. 2005. An evaluation of RDF storage systems for large data applications. In Proceedings of the 1st International Conference on Semantics, Knowledge and Grid. IEEE, 59--59. Google ScholarDigital Library
Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2012. Distributed graphlab: A framework for machine learning in the cloud. Proc. VLDB 5, 8 (2012), 716--727. Retrieved from http://vldb.org/pvldb/vol5/p716_yuchenglow_vldb2012.pdf. Google ScholarDigital Library
Li Ma, Zhong Su, Yue Pan, Li Zhang, and Tao Liu. 2004. RStar: An RDF storage and query system for enterprise resource management. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management. ACM, 484--491. Google ScholarDigital Library
Miguel A. Martínez-Prieto, Mario Arias, and Javier D. Fernandez. 2012. Exchange and consumption of huge RDF data. In The Semantic Web: Research and Applications. Springer, 437--452. Google ScholarDigital Library
Brian McBride. 2002. Jena: A semantic web toolkit. IEEE Internet Comput. 6, 6 (2002), 55--59. Google ScholarDigital Library
Mohamed Morsey, Jens Lehmann, Sören Auer, and Axel-Cyrille Ngonga Ngomo. 2011. DBpedia SPARQL benchmark--Performance assessment with real queries on real data. In Proceedings of the International Semantic Web Conference (ISWC’11). Springer, 454--469. Google ScholarDigital Library
Raghava Mutharaju, Sherif Sakr, Alessandra Sala, and Pascal Hitzler. 2013. D-SPARQ: Distributed, scalable and efficient RDF query engine. In Proceedings of the ISWC 2013 Posters 8 Demonstrations Track. 261--264. Retrieved from http://ceur-ws.org/Vol-1035/iswc2013_poster_21.pdf. Google ScholarDigital Library
Hubert Naacke, Olivier Curé, and Bernd Amann. 2016. SPARQL query processing with apache spark. CoRR abs/1604.08903 (2016). Retrieved from http://arxiv.org/abs/1604.08903.Google Scholar
Thomas Neumann and Gerhard Weikum. 2008. RDF-3X: A RISC-style engine for RDF. Proc. VLDB Endow. 1, 1 (2008), 647--659. Google ScholarDigital Library
Thomas Neumann and Gerhard Weikum. 2010. The RDF-3X engine for scalable management of RDF data. VLDB J. 19, 1 (2010), 91--113. Google ScholarDigital Library
Andriy Nikolov, Andreas Schwarte, and Christian Hütter. 2013. Fedsearch: Efficiently combining structured queries and full-text search in a SPARQL federation. In Proceedings of the International Semantic Web Conference. Springer, 427--443. Google ScholarDigital Library
Damla Oguz, Belgin Ergenc, Shaoyi Yin, Oguz Dikenelli, and Abdelkader Hameurlain. 2015. Federated query processing on linked data: A qualitative survey and open challenges. Knowl. Eng. Rev. 30, 5 (2015), 545--563.Google ScholarCross Ref
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008. Pig latin: A not-so-foreign language for data processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’08). 1099--1110. Google ScholarDigital Library
M. Tamer Özsu. 2016. A survey of RDF data management systems. Front. Comput. Sci. 10, 3 (2016), 418--432. Google ScholarDigital Library
Nikolaos Papailiou, Ioannis Konstantinou, Dimitrios Tsoumakos, Panagiotis Karras, and Nectarios Koziris. 2013. H2RDF+: High-performance distributed joins over large-scale RDF graphs. In Proceedings of the 2013 IEEE International Conference on Big Data. 255--263.Google ScholarCross Ref
Nikolaos Papailiou, Ioannis Konstantinou, Dimitrios Tsoumakos, and Nectarios Koziris. 2012. H2RDF: Adaptive query processing on RDF data in the cloud. In Proceedings of the 21st World Wide Web Conference (WWW’12). 397--400. Google ScholarDigital Library
Nikolaos Papailiou, Dimitrios Tsoumakos, Ioannis Konstantinou, Panagiotis Karras, and Nectarios Koziris. 2014. HRDF+: An efficient data management system for big RDF graphs. In Proceedings of the International Conference on Management of Data (SIGMOD’14). 909--912. Google ScholarDigital Library
Peng Peng, Lei Zou, Lei Chen, and Dongyan Zhao. 2016. Query workload-based RDF graph fragmentation and allocation. In Proceedings of the 19th International Conference on Extending Database Technology (EDBT’16). 377--388.Google Scholar
Minh-Duc Pham, Peter Boncz, and Orri Erling. 2012. S3g2: A scalable structure-correlated social graph generator. In Proceedings of the Technology Conference on Performance Evaluation and Benchmarking. Springer, 156--172.Google Scholar
Roshan Punnoose, Adina Crainiceanu, and David Rapp. 2015. SPARQL in the cloud using Rya. Inf. Syst. 48 (2015), 181--195. Google ScholarDigital Library
Nur Aini Rakhmawati, Jürgen Umbrich, Marcel Karnstedt, Ali Hasnain, and Michael Hausenblas. 2013. Querying over federated SPARQL endpoints—A state of the art survey. arXiv Preprint arXiv:1306.1723 (2013).Google Scholar
Louiqa Raschid and Stanley Y. W. Su. 1986. A parallel processing strategy for evaluating recursive queries. In Proceedings of the Conference on Very Large Data Bases (VLDB’86), Vol. 86. 412--419. Google ScholarDigital Library
Padmashree Ravindra, HyeongSik Kim, and Kemafor Anyanwu. 2011. An intermediate algebra for optimizing RDF graph pattern matching on mapreduce. In Proceedings of the 8th Extended Semantic Web Conference: Research and Applications (ESWC’11). 46--61. Google ScholarDigital Library
Kurt Rohloff and Richard E. Schantz. 2010. High-performance, massively scalable distributed systems using the mapreduce software framework: The SHARD triple-store. In Programming Support Innovations for Emerging Distributed Applications. ACM, 4. Google ScholarDigital Library
Sherif Sakr, Anna Liu, Daniel M. Batista, and Mohammad Alomari. 2011. A survey of large scale data management approaches in cloud environments. IEEE Commun. Surveys Tutor. 13, 3 (2011), 311--336.Google ScholarCross Ref
Sherif Sakr, Anna Liu, and Ayman G. Fayoumi. 2013. The family of mapreduce and large-scale data processing systems. Comput. Surveys 46, 1 (2013). Google ScholarDigital Library
Muhammad Saleem, Yasar Khan, Ali Hasnain, Ivan Ermilov, and Axel-Cyrille Ngonga Ngomo. 2016. A fine-grained evaluation of SPARQL endpoint federation systems. Semantic Web 7, 5 (2016), 493--518.Google ScholarDigital Library
Alexander Schätzle, Martin Przyjaciel-Zablocki, Thorsten Berberich, and Georg Lausen. 2015. S2X: Graph-parallel querying of RDF with graphX. In Proceedings of the 1st International Workshop on Big-Graphs Online Querying (BigOQ’15).Google Scholar
Alexander Schätzle, Martin Przyjaciel-Zablocki, Thomas Hornung, and Georg Lausen. 2013. PigSPARQL: A SPARQL query processing baseline for big data. In Proceedings of the ISWC 2013 Posters 8 Demonstrations Track. 241--244. Retrieved from http://ceur-ws.org/Vol-1035/iswc2013_poster_16.pdf. Google ScholarDigital Library
Alexander Schätzle, Martin Przyjaciel-Zablocki, Simon Skilevic, and Georg Lausen. 2015. S2RDF: RDF querying with SPARQL on spark. CoRR abs/1512.07021 (2015). Retrieved from http://arxiv.org/abs/1512.07021.Google Scholar
M. Schmidt, T. Hornung, N. Küchlin, G. Lausen, and C. Pinkel. 2008. An experimental comparison of RDF data management approaches in a SPARQL benchmark scenario. In Proceedings of the International Semantic Web Conference (ISWC’08). 82--97. Google ScholarDigital Library
M. Schmidt, T. Hornung, G. Lausen, and C. Pinkel. 2009. SPˆ 2bench: A SPARQL performance benchmark. In Proceedings of the IEEE 25th International Conference on Data Engineering (ICDE’09). IEEE, 222--233. Google ScholarDigital Library
Andreas Schwarte, Peter Haase, Katja Hose, Ralf Schenkel, and Michael Schmidt. 2011. Fedx: Optimization techniques for federated query processing on linked data. In Proceedings of the International Semantic Web Conference. Springer, 601--616. Google ScholarDigital Library
Bin Shao, Haixun Wang, and Yatao Li. 2013. Trinity: A distributed graph engine on a memory cloud. In Proceedings of the 2013 International Conference on Management of Data. ACM, 505--516. Google ScholarDigital Library
Lefteris Sidirourgos, Romulo Goncalves, Martin Kersten, Niels Nes, and Stefan Manegold. 2008. Column-store support for RDF data management: Not all swans are white. Proc. VLDB Endow. 1, 2 (2008), 1553--1563. Google ScholarDigital Library
Markus Stocker, Andy Seaborne, Abraham Bernstein, Christoph Kiefer, and Dave Reynolds. 2008. SPARQL basic graph pattern optimization using selectivity estimation. In Proceedings of the 17th International Conference on World Wide Web (WWW’08). ACM, 595--604. Google ScholarDigital Library
M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. R. Madden, E. O’Neil, P. O’Neil, A. Rasin, N. Tran, and S. Zdonik. 2005. C-store: A column oriented DBMS. In Proceedings of the International Conference on Very Large Data Bases (VLDB’05). Google ScholarDigital Library
Philip Stutz, Abraham Bernstein, and William Cohen. 2010. Signal/collect: Graph algorithms for the (semantic) web. In Proceedings of the International Semantic Web Conference. Springer, 764--780. Google ScholarDigital Library
Philip Stutz, Bibek Paudel, Mihaela Verman, and Abraham Bernstein. 2015. Random walk triplerush: Asynchronous graph querying and sampling. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). ACM, 1034--1044. Google ScholarDigital Library
Tolga Urhan and Michael J. Franklin. 2000. Xjoin: A reactively scheduled pipelined join operatorỳ. Bull. Tech. Committee (2000), 27.Google Scholar
Patrick Valduriez. 1987. Join indices. ACM Trans. Database Syst. 12, 2 (1987), 218--246. Google ScholarDigital Library
Xin Wang, Thanassis Tiropanis, and Hugh C. Davis. 2013. Lhd: Optimising linked data query processing using parallelisation. LDOW. http://ceur-ws.org/Vol-996/papers/ldow2013-paper-06.pdf.Google Scholar
Cathrin Weiss, Panagiotis Karras, and Abraham Bernstein. 2008. Hexastore: Sextuple indexing for semantic web data management. Proc. VLDB Endow. 1, 1 (2008), 1008--1019. Google ScholarDigital Library
Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, and Dave Reynolds. 2003. Efficient RDF storage and retrieval in jena2. In Proceedings of the International Conference on Semantic Web and Databases (SWDB’03). 131--150. Google ScholarDigital Library
Kevin Wilkinson and Kevin Wilkinson. 2006. Jena property table implementation. In Proceedings of the International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS’06).Google Scholar
Buwen Wu, Yongluan Zhou, Pingpeng Yuan, Hai Jin, and Ling Liu. 2014. SemStore: A semantic-preserving distributed RDF triple store. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM’14). 509--518. Google ScholarDigital Library
Marcin Wylot and Philippe Cudré-Mauroux. 2016. DiploCloud: Efficient and scalable management of RDF data in the cloud. IEEE Trans. Knowl. Data Eng. 28, 3 (2016), 659--674. Google ScholarDigital Library
Marcin Wylot, Jigé Pont, Mariusz Wisniewski, and Philippe Cudré-Mauroux. 2011. dipLODocus{RDF} - Short and long-tail RDF analytics for massive webs of data. In Proceedings of the International Semantic Web Conference. 778--793. Google ScholarDigital Library
Pingpeng Yuan, Pu Liu, Buwen Wu, Hai Jin, Wenya Zhang, and Ling Liu. 2013. TripleBit: A fast and compact system for large scale RDF data. Proc. VLDB Endow. 6, 7 (2013), 517--528. Google ScholarDigital Library
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In Proceedings of the 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud’10). Retrieved from https://www.usenix.org/conference/hotcloud-10/spark-cluster-computing-working-sets. Google ScholarDigital Library
Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan Wang. 2013. A distributed graph engine for web scale RDF data. In Proceedings of the 39th International Conference on Very Large Data Bases. VLDB Endowment, 265--276. Google ScholarDigital Library
Xiaofei Zhang, Lei Chen, Yongxin Tong, and Min Wang. 2013. EAGRE: Towards scalable I/O efficient SPARQL query evaluation on the cloud. In Proceedings of the 29th IEEE International Conference on Data Engineering (ICDE’13). 565--576. Google ScholarDigital Library
Lei Zou, M. Tamer Özsu, Lei Chen, Xuchuan Shen, Ruizhe Huang, and Dongyan Zhao. 2014. gStore: A graph-based SPARQL query engine. VLDB J. 23, 4 (2014), 565--590. Google ScholarDigital Library

Index Terms

RDF Data Storage and Query Processing Schemes: A Survey
1. Information systems
  1. Data management systems
    1. Database design and models
      1. Data model extensions
        Semi-structured data
      2. Graph-based database models
    2. Query languages

Recommendations

RDF, Jena, SparQL and the 'Semantic Web'
SIGUCCS '09: Proceedings of the 37th annual ACM SIGUCCS fall conference: communication and collaboration

The Resource Description Format (RDF) is used to represent information modeled as a "graph": a set of individual objects, along with a set of connections among those objects. In that role, RDF is one of the pillars of the so-called Semantic Web. This ...
Read More
The RDF foundry: call for an initiative to build enhanced RDF resources for biological data integration
WIMS '11: Proceedings of the International Conference on Web Intelligence, Mining and Semantics

Currently, the OBO Foundry plays an important role by setting guidelines to formalise the concepts within the biomedical domain. The ontologies within the OBO Foundry are usually represented in the OBO ontology language. While being human-readable, this ...
Read More
Don't like RDF reification?: making statements about statements using singleton property
WWW '14: Proceedings of the 23rd international conference on World wide web

Statements about RDF statements, or meta triples, provide additional information about individual triples, such as the source, the occurring time or place, or the certainty. Integrating such meta triples into semantic knowledge bases would enable the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 51, Issue 4
July 2019
765 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3236632
Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering / University of Florida / Gainesville, FL 32611
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 September 2018
- Revised: 1 December 2017
- Accepted: 1 December 2017
- Received: 1 November 2016
Published in csur Volume 51, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
RDF
SPARQL
semi-structured data
Qualifiers
- survey
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 74
  Total Citations
  View Citations
- 1,717
  Total Downloads
- Downloads (Last 12 months)174
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

RDF Data Storage and Query Processing Schemes: A Survey

ACM Computing Surveys

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

RDF, Jena, SparQL and the 'Semantic Web'

The RDF foundry: call for an initiative to build enhanced RDF resources for biological data integration

Don't like RDF reification?: making statements about statements using singleton property