research-article

PigSPARQL: mapping SPARQL to Pig Latin

Authors:
Alexander Schätzle

University of Freiburg, Germany

University of Freiburg, Germany
View Profile

,
Martin Przyjaciel-Zablocki

University of Freiburg, Germany

University of Freiburg, Germany
View Profile

,
Georg Lausen

University of Freiburg, Germany

University of Freiburg, Germany
View Profile

SWIM '11: Proceedings of the International Workshop on Semantic Web Information ManagementJune 2011Article No.: 4Pages 1–8https://doi.org/10.1145/1999299.1999303

Published:12 June 2011Publication History

SWIM '11: Proceedings of the International Workshop on Semantic Web Information Management

Pages 1–8

ABSTRACT

In this paper we investigate the scalable processing of complex SPARQL queries on very large RDF datasets. As underlying platform we use Apache Hadoop, an open source implementation of Google's MapReduce for massively parallelized computations on a computer cluster. We introduce PigSPARQL, a system which gives us the opportunity to process complex SPARQL queries on a MapReduce cluster. To this end, SPARQL queries are translated into Pig Latin, a data analysis language developed by Yahoo! Research. Pig Latin programs are executed by a series of MapReduce jobs on a Hadoop cluster. We evaluate the processing of SPARQL queries by means of PigSPARQL using the SP²Bench, a SPARQL specific performance benchmark and demonstrate that PigSPARQL enables a scalable execution of SPARQL queries based on Hadoop without any additional programming efforts.

References

D. J. Abadi, A. Marcus, S. Madden, and K. J. Hollenbach. Scalable Semantic Web Data Management Using Vertical Partitioning. In Proc. VLDB, pages 411--422, 2007. Google ScholarDigital Library
Apache. Pig Latin Reference Manual 1 & 2. http://pig.apache.org/docs/, 2010.Google Scholar
J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: A generic architecture for storing and querying rdf and rdf schema. In Proc. ISWC, pages 54--68. Springer, 2002. Google ScholarDigital Library
H. Choi, J. Son, Y. Cho, M. K. Sung, and Y. D. Chung. SPIDER: A System for Scalable, Parallel/Distributed Evaluation of Large-Scale RDF Data. In CIKM, pages 2087--2088, 2009. Google ScholarDigital Library
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarDigital Library
O. Erling and I. Mikhailov. Towards web scale RDF. In Proc. SSWS, 2008.Google Scholar
A. F. Gates, O. Natkovich, S. Chopra, P. Kamath, S. M. Narayanamurthy, C. Olston, B. Reed, S. Srinivasan, and U. Srivastava. Building a high-level dataflow system on top of map-reduce: the pig experience. Proc. VLDB Endow., 2:1414--1425, 2009. Google ScholarDigital Library
S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In Proc. SOSP, pages 29--43, 2003. Google ScholarDigital Library
Y. Guo, Z. Pan, and J. Heflin. LUBM: A benchmark for OWL knowledge base systems. Web Semantics: Science, Services and Agents on the World Wide Web, 3(2--3):158--182, 2005. Google ScholarDigital Library
S. Harris, N. Lamb, and N. Shadbolt. 4store: The design and implementation of a clustered rdf store. In Proc. SSWS, page 81, 2009.Google Scholar
O. Hartig and R. Heese. The SPARQL query graph model for query optimization. The Semantic Web: Research and Applications, pages 564--578, 2007. Google ScholarDigital Library
M. Husain, L. Khan, M. Kantarcioglu, and B. Thuraisingham. Data intensive query processing for large RDF graphs using cloud computing tools. In Proc. CLOUD, pages 1--10. IEEE, 2010. Google ScholarDigital Library
M. Ley. DBLP Bibliography. http://www.informatik.uni-trier.de/ley/db/, 2010.Google Scholar
J. Lin and C. Dyer. Data-intensive text processing with MapReduce. Synthesis Lectures on Human Language Technologies, 3(1):1--177, 2010. Google ScholarCross Ref
F. Manola, E. Miller, and B. McBride. RDF Primer. http://www.w3.org/TR/rdf-primer/, 2004.Google Scholar
B. McBride. Jena: Implementing the RDF Model and Syntax Specification. In SemWeb, 2001.Google ScholarDigital Library
P. Mika and G. Tummarello. Web Semantics in the Clouds. IEEE Intelligent Systems, 23(5):82--87, 2008. Google ScholarDigital Library
J. Myung, J. Yeon, and S. Lee. SPARQL basic graph pattern processing with iterative MapReduce. In Proc. MDAC, pages 1--6. ACM, 2010. Google ScholarDigital Library
T. Neumann and G. Weikum. RDF-3X: a RISC-style engine for RDF. Proc. of the VLDB Endowment, 1(1):647--659, 2008. Google ScholarDigital Library
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In Proc. SIGMOD, pages 1099--1110. ACM, 2008. Google ScholarDigital Library
A. Owens, A. Seaborne, and N. Gibbins. Clustered TDB: A Clustered Triple Store for Jena. 2008.Google Scholar
A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In Proc. SIGMOD, pages 165--178. ACM, 2009. Google ScholarDigital Library
J. Pérez, M. Arenas, and C. Gutierrez. Semantics and complexity of SPARQL. ACM Transactions on Database Systems (TODS), 34(3):1--45, 2009. Google ScholarDigital Library
E. Prud'hommeaux and A. Seaborne. SPARQL Query Language for RDF. http://www.w3.org/TR/rdf-sparql-query/, 2006.Google Scholar
P. Ravindra, V. Deshpande, and K. Anyanwu. Towards scalable RDF graph analytics on MapReduce. In Proc. MDAC, pages 1--6. ACM, 2010. Google ScholarDigital Library
A. Schätzle, M. Przyjaciel-Zablocki, T. Hornung, and G. Lausen. PigSPARQL: Übersetzung von SPARQL nach PigLatin. In Proc. BTW, pages 65--84, 2011.Google Scholar
M. Schmidt, T. Hornung, G. Lausen, and C. Pinkel. SP2Bench: A SPARQL Performance Benchmark. In Proc. ICDE, pages 222--233, 2009. Google ScholarDigital Library
M. Schmidt, M. Meier, and G. Lausen. Foundations of SPARQL query optimization. In Proc. ICDT, pages 4--33, 2010. Google ScholarDigital Library
M. Stocker, A. Seaborne, A. Bernstein, C. Kiefer, and D. Reynolds. SPARQL basic graph pattern optimization using selectivity estimation. In Proc. WWW, pages 595--604. ACM, 2008. Google ScholarDigital Library

Index Terms

PigSPARQL: mapping SPARQL to Pig Latin

Recommendations

PigSPARQL: a SPARQL query processing baseline for big data
ISWC-PD '13: Proceedings of the 12th International Semantic Web Conference (Posters & Demonstrations Track) - Volume 1035

In this paper we discuss PigSPARQL, a competitive yet easy to use SPARQL query processing system on MapReduce that allows adhoc SPARQL query processing on large RDF graphs out of the box. Instead of a direct mapping, PigSPARQL uses the query language of ...
Read More
Big Data Analytics with R and Hadoop
Read More
Big Data Analytics
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SWIM '11: Proceedings of the International Workshop on Semantic Web Information Management
June 2011
61 pages
ISBN:9781450306515
DOI:10.1145/1999299
General Chairs:
Roberto De Virgilio
Universitá Roma Tre, Italy
,
Fausto Giunchiglia
Universitá di Trento, Italy
,
Letizia Tanca
Politecnico di Milano, Italy
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 June 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 93
  Total Citations
  View Citations
- 494
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

PigSPARQL: mapping SPARQL to Pig Latin

SWIM '11: Proceedings of the International Workshop on Semantic Web Information Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

PigSPARQL: a SPARQL query processing baseline for big data

Big Data Analytics with R and Hadoop

Big Data Analytics

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

PigSPARQL: mapping SPARQL to Pig Latin

SWIM '11: Proceedings of the International Workshop on Semantic Web Information Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

PigSPARQL: a SPARQL query processing baseline for big data

Big Data Analytics with R and Hadoop

Big Data Analytics

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media