skip to main content
10.1145/2820783.2820860acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
short-paper

GeoSpark: a cluster computing framework for processing large-scale spatial data

Published:03 November 2015Publication History

ABSTRACT

This paper introduces GeoSpark an in-memory cluster computing framework for processing large-scale spatial data. GeoSpark consists of three layers: Apache Spark Layer, Spatial RDD Layer and Spatial Query Processing Layer. Apache Spark Layer provides basic Spark functionalities that include loading / storing data to disk as well as regular RDD operations. Spatial RDD Layer consists of three novel Spatial Resilient Distributed Datasets (SRDDs) which extend regular Apache Spark RDDs to support geometrical and spatial objects. GeoSpark provides a geometrical operations library that accesses Spatial RDDs to perform basic geometrical operations (e.g., Overlap, Intersect). System users can leverage the newly defined SRDDs to effectively develop spatial data processing programs in Spark. The Spatial Query Processing Layer efficiently executes spatial query processing algorithms (e.g., Spatial Range, Join, KNN query) on SRDDs. GeoSpark also allows users to create a spatial index (e.g., R-tree, Quad-tree) that boosts spatial data processing performance in each SRDD partition. Preliminary experiments show that GeoSpark achieves better run time performance than its Hadoop-based counterparts (e.g., SpatialHadoop).

References

  1. A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu, X. Zhang, and J. H. Saltz. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce. PVLDB, 6(11):1009--1020, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Eldawy and M. F. Mokbel. A demonstration of spatialhadoop: An efficient mapreduce framework for spatial data. PVLDB, 6(12):1230--1233, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Guttman. R-trees: a dynamic index structure for spatial searching. In SIGMOD, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Lu and R. H. Guting. Parallel Secondo: Boosting Database Engines with Hadoop. In ICPADS, pages 738--743, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Luo, J. F. Naughton, and C. J. Ellmann. A non-blocking parallel spatial join algorithm. In Data Engineering, 2002. Proceedings. 18th International Conference on, pages 697--705. IEEE, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Nishimura, S. Das, D. Agrawal, and A. E. Abbadi. MD-Hbase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services. In MDM, pages 7--16, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries. In ACM SIGMOD record, volume 24, pages 71--79. ACM, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Samet. The quadtree and related hierarchical data structures. ACM Computing Surveys (CSUR), 16(2):187--260, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauly, M. J. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI, pages 15--28, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. X. Zhou, D. J. Abel, and D. Truffet. Data partitioning for parallel spatial join processing. Geoinformatica, 2(2):175--204, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. GeoSpark: a cluster computing framework for processing large-scale spatial data

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGSPATIAL '15: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems
          November 2015
          646 pages
          ISBN:9781450339674
          DOI:10.1145/2820783

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 3 November 2015

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper

          Acceptance Rates

          SIGSPATIAL '15 Paper Acceptance Rate38of212submissions,18%Overall Acceptance Rate220of1,116submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader