skip to main content
10.1145/1999299.1999303acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

PigSPARQL: mapping SPARQL to Pig Latin

Published:12 June 2011Publication History

ABSTRACT

In this paper we investigate the scalable processing of complex SPARQL queries on very large RDF datasets. As underlying platform we use Apache Hadoop, an open source implementation of Google's MapReduce for massively parallelized computations on a computer cluster. We introduce PigSPARQL, a system which gives us the opportunity to process complex SPARQL queries on a MapReduce cluster. To this end, SPARQL queries are translated into Pig Latin, a data analysis language developed by Yahoo! Research. Pig Latin programs are executed by a series of MapReduce jobs on a Hadoop cluster. We evaluate the processing of SPARQL queries by means of PigSPARQL using the SP2Bench, a SPARQL specific performance benchmark and demonstrate that PigSPARQL enables a scalable execution of SPARQL queries based on Hadoop without any additional programming efforts.

References

  1. D. J. Abadi, A. Marcus, S. Madden, and K. J. Hollenbach. Scalable Semantic Web Data Management Using Vertical Partitioning. In Proc. VLDB, pages 411--422, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Apache. Pig Latin Reference Manual 1 & 2. http://pig.apache.org/docs/, 2010.Google ScholarGoogle Scholar
  3. J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: A generic architecture for storing and querying rdf and rdf schema. In Proc. ISWC, pages 54--68. Springer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. Choi, J. Son, Y. Cho, M. K. Sung, and Y. D. Chung. SPIDER: A System for Scalable, Parallel/Distributed Evaluation of Large-Scale RDF Data. In CIKM, pages 2087--2088, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. O. Erling and I. Mikhailov. Towards web scale RDF. In Proc. SSWS, 2008.Google ScholarGoogle Scholar
  7. A. F. Gates, O. Natkovich, S. Chopra, P. Kamath, S. M. Narayanamurthy, C. Olston, B. Reed, S. Srinivasan, and U. Srivastava. Building a high-level dataflow system on top of map-reduce: the pig experience. Proc. VLDB Endow., 2:1414--1425, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In Proc. SOSP, pages 29--43, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Guo, Z. Pan, and J. Heflin. LUBM: A benchmark for OWL knowledge base systems. Web Semantics: Science, Services and Agents on the World Wide Web, 3(2--3):158--182, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Harris, N. Lamb, and N. Shadbolt. 4store: The design and implementation of a clustered rdf store. In Proc. SSWS, page 81, 2009.Google ScholarGoogle Scholar
  11. O. Hartig and R. Heese. The SPARQL query graph model for query optimization. The Semantic Web: Research and Applications, pages 564--578, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Husain, L. Khan, M. Kantarcioglu, and B. Thuraisingham. Data intensive query processing for large RDF graphs using cloud computing tools. In Proc. CLOUD, pages 1--10. IEEE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Ley. DBLP Bibliography. http://www.informatik.uni-trier.de/ley/db/, 2010.Google ScholarGoogle Scholar
  14. J. Lin and C. Dyer. Data-intensive text processing with MapReduce. Synthesis Lectures on Human Language Technologies, 3(1):1--177, 2010. Google ScholarGoogle ScholarCross RefCross Ref
  15. F. Manola, E. Miller, and B. McBride. RDF Primer. http://www.w3.org/TR/rdf-primer/, 2004.Google ScholarGoogle Scholar
  16. B. McBride. Jena: Implementing the RDF Model and Syntax Specification. In SemWeb, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Mika and G. Tummarello. Web Semantics in the Clouds. IEEE Intelligent Systems, 23(5):82--87, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Myung, J. Yeon, and S. Lee. SPARQL basic graph pattern processing with iterative MapReduce. In Proc. MDAC, pages 1--6. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Neumann and G. Weikum. RDF-3X: a RISC-style engine for RDF. Proc. of the VLDB Endowment, 1(1):647--659, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In Proc. SIGMOD, pages 1099--1110. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Owens, A. Seaborne, and N. Gibbins. Clustered TDB: A Clustered Triple Store for Jena. 2008.Google ScholarGoogle Scholar
  22. A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In Proc. SIGMOD, pages 165--178. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Pérez, M. Arenas, and C. Gutierrez. Semantics and complexity of SPARQL. ACM Transactions on Database Systems (TODS), 34(3):1--45, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. Prud'hommeaux and A. Seaborne. SPARQL Query Language for RDF. http://www.w3.org/TR/rdf-sparql-query/, 2006.Google ScholarGoogle Scholar
  25. P. Ravindra, V. Deshpande, and K. Anyanwu. Towards scalable RDF graph analytics on MapReduce. In Proc. MDAC, pages 1--6. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Schätzle, M. Przyjaciel-Zablocki, T. Hornung, and G. Lausen. PigSPARQL: Übersetzung von SPARQL nach PigLatin. In Proc. BTW, pages 65--84, 2011.Google ScholarGoogle Scholar
  27. M. Schmidt, T. Hornung, G. Lausen, and C. Pinkel. SP2Bench: A SPARQL Performance Benchmark. In Proc. ICDE, pages 222--233, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Schmidt, M. Meier, and G. Lausen. Foundations of SPARQL query optimization. In Proc. ICDT, pages 4--33, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Stocker, A. Seaborne, A. Bernstein, C. Kiefer, and D. Reynolds. SPARQL basic graph pattern optimization using selectivity estimation. In Proc. WWW, pages 595--604. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PigSPARQL: mapping SPARQL to Pig Latin

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SWIM '11: Proceedings of the International Workshop on Semantic Web Information Management
            June 2011
            61 pages
            ISBN:9781450306515
            DOI:10.1145/1999299

            Copyright © 2011 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 12 June 2011

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader