ABSTRACT
Given two layers of large polygonal datasets, detecting those pairs of cross-layer polygons which satisfy a join predicate, such as intersection or contain, is one of the most computationally intensive primitive operations in the spatial domain applications. In this work, we introduce GCMF, an end-to-end software system, that is able to handle spatial join (with ST_Intersect operation) over non-indexed polygonal datasets with over 3 GB file size comprising more than 600, 000 polygons on a single GPU within less than 8 sec by applying innovative filter and refinement techniques. GCMF performs a two-step filtering phase. 1) A sort-based Minimum Bounding Rectangle (MBR) filtering step detects potentially overlapping polygon pairs up to 20 times faster than the optimized GEOS library routine. 2) A linear time Common MBR filtering step (based on the overlapping area of two given MBRs) that not only eliminates two-third of the candidate polygon pairs but also reduces the number of edges to be considered in the refinement phase by 40-fold on an average based on our experimental results with real datasets. Furthermore, for the refinement phase, GCMF implements a load-balanced parallel point-in-polygon and edge-intersection tests over GPU. Our experimental results with three different real datasets show up to 39-fold end-to- end speedup versus optimized sequential routines of GEOS C++ library as well as PostgreSQL spatial database with PostGIS.
- Aji, A., Teodoro, G., and Wang, F. Haggis: Turbocharge a mapreduce based spatial data warehousing system with GPU engine. In Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data (2014), ACM, pp. 15--20. Google ScholarDigital Library
- Arge, L., Procopiuc, O., Ramaswamy, S., Suel, T., and Vitter, J. S. Scalable sweeping-based spatial join. In VLDB (1998), vol. 98, Citeseer, pp. 570--581.Google Scholar
- Audet, S., Albertsson, C., Murase, M., and Asahara, A. Robust and efficient polygon overlay on parallel stream processors. In Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (2013), ACM, pp. 304--313. Google ScholarDigital Library
- Civilmap. Available at: https://blog.civilmaps.com/the-lidar-data-crunch/.Google Scholar
- García, Y. J., Lopez, M. A., and Leutenegger, S. T. On optimal node splitting for R-trees. In Proceedings of the 24rd International Conference on Very Large Data Bases (1998), Morgan Kaufmann Publishers Inc., pp. 334--344.Google ScholarDigital Library
- Jacox, E. H., and Samet, H. Spatial join techniques. ACM Transactions on Database Systems (TODS) 32, 1 (2007), 7.Google Scholar
- Leutenegger, S. T., Lopez, M., Edgington, J., Et Al. Str: A simple and efficient algorithm for R-tree packing. In Data Engineering, 1997. Proceedings. 13th International Conference on (1997), IEEE, pp. 497--506.Google ScholarCross Ref
- Lieberman, M. D., Sankaranarayanan, J., and Samet, H. A fast similarity join algorithm using graphics processing units. In Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on (2008), IEEE, pp. 1111--1120. Google ScholarDigital Library
- Luo, L., Wong, M. D., and Leong, L. Parallel implementation of R-trees on the GPU. In Design Automation Conference (ASP-DAC), 2012 17th Asia and South Pacific (2012), IEEE, pp. 353--358.Google ScholarCross Ref
- Mckenney, M., and Mcguire, T. A Parallel plane sweep algorithm for multi-core systems. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (2009), ACM, pp. 392--395. Google ScholarDigital Library
- Pavlovic, M., Tauheed, F., Heinis, T., and Ailamakit, A. GIPSY: joining spatial datasets with contrasting density. In Proceedings of the 25th International Conference on Scientific and Statistical Database Management (2013), ACM, p. 11. Google ScholarDigital Library
- Postgis. http://postgis.net/.Google Scholar
- Prasad, S. K., Mcdermott, M., He, X., and Puri, S. GPU-based parallel R-tree construction and querying. In Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International (2015), IEEE, pp. 618--627.Google ScholarDigital Library
- Prasad, S. K., Mcdermott, M., Puri, S., Shah, D., Aghajarian, D., Shekhar, S., and Zhou, X. a vision for GPU-accelerated parallel computation on geo-spatial datasets. SIGSPATIAL Special 6, 3 (2015), 19--26. Google ScholarDigital Library
- Puri, S., and Prasad, S. K. MPI-GIS: New parallel overlay algorithm and system prototype.Google Scholar
- Puri, S., and Prasad, S. K. A parallel algorithm for clipping polygons with improved bounds and a distributed overlay processing system using MPI. In Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on (2015), IEEE, pp. 576--585. Google ScholarDigital Library
- Shimrat, M. Algorithm 112: position of point relative to polygon. Communications of the ACM 5, 8 (1962), 434.Google Scholar
- Simion, B., Ray, S., and Brown, A. D. Speeding up spatial database query execution using gpus. Procedia Computer Science 9 (2012), 1870--1879. Google ScholarCross Ref
- Stonebraker, M., Frew, J., Gardels, K., and Meredith, J. The sequoia 2000 storage benchmark. In ACM SIGMOD Record (1993), vol. 22, ACM, pp. 2--11. Google ScholarDigital Library
- Yampaka, T., and Chongstitvatana, P. Spatial join with R-tree on graphics processing units. KMUTNB: International Journal of Applied Science and Technology 5, 3 (2013), 1--7.Google Scholar
- You, S., Zhang, J., and Gruenwald, L. Parallel spatial query processing on gpus using R-trees. In Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data (2013), ACM, pp. 23--31. Google ScholarDigital Library
- You, S., Zhang, J., and Gruenwald, L. Scalable and efficient spatial data management on multi-core CPU and GPU clusters: A preliminary implementation based on impala. In Data Engineering Workshops (ICDEW), 2015 31st IEEE International Conference on (2015), IEEE, pp. 143--148. Google ScholarCross Ref
- Zhang, J., and You, S. Cudagis: report on the design and realization of a massive data parallel GIS on GPUs. In Proceedings of the Third ACM SIGSPATIAL International Workshop on GeoStreaming (2012), ACM, pp. 101--108. Google ScholarDigital Library
- Zhang, J., and You, S. Speeding up large-scale point-in-polygon test based spatial join on GPUs. In Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data (2012), ACM, pp. 23--32. Google ScholarDigital Library
- Zhang, J., you, S., and Gruenwald, L. High-performance spatial query processing on big taxi trip data using gpgpus. In Big Data (BigData Congress), 2014 IEEE International Congress on (2014), IEEE, pp. 72--79.Google ScholarDigital Library
- Zhou, X., Abel, D. J., and Truffet, D. Data partitioning for parallel spatial join processing. Geoinformatica 2, 2 (1998), 175--204. Google ScholarDigital Library
Index Terms
- GCMF: an efficient end-to-end spatial join system over large polygonal datasets on GPGPU platform
Recommendations
Speeding up large-scale point-in-polygon test based spatial join on GPUs
BigSpatial '12: Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial DataPoint-in-Polygon (PIP) test is fundamental to spatial databases and GIS. Motivated by the slow response times in joining large-scale point locations with polygons using traditional spatial databases and GIS, we have designed and developed an end-to-end ...
Clustering billions of data points using GPUs
UCHPC-MAW '09: Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshopIn this paper, we report our research on using GPUs to accelerate clustering of very large data sets, which are common in today's real world applications. While many published works have shown that GPUs can be used to accelerate various general purpose ...
MPI-Vector-IO: Parallel I/O and Partitioning for Geospatial Vector Data
ICPP '18: Proceedings of the 47th International Conference on Parallel ProcessingIn recent times, geospatial datasets are growing in terms of size, complexity and heterogeneity. High performance systems are needed to analyze such data to produce actionable insights in an efficient manner. For polygonal a.k.a vector datasets, ...
Comments