skip to main content
10.1145/3139958.3139963acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

Spatio-Temporal Join on Apache Spark

Published:07 November 2017Publication History

ABSTRACT

Effective processing of extremely large volumes of spatial data has led to many organizations employing distributed processing frameworks. Apache Spark is one such open-source framework that is enjoying widespread adoption. Within this data space, it is important to note that most of the observational data (i.e., data collected by sensors, either moving or stationary) has a temporal component, or timestamp. In order to perform advanced analytics and gain insights, the temporal component becomes equally important as the spatial and attribute components. In this paper, we detail several variants of a spatial join operation that addresses both spatial, temporal, and attribute-based joins. Our spatial join technique differs from other approaches in that it combines spatial, temporal, and attribute predicates in the join operator.

In addition, our spatio-temporal join algorithm and implementation differs from others in that it runs in commercial off-the-shelf (COTS) application. The users of this functionality are assumed to be GIS analysts with little if any knowledge of the implementation details of spatio-temporal joins or distributed processing. They are comfortable using simple tools that do not provide the ability to tweak the configuration of the

References

  1. Abel, D. J., Ooi, B. C., Tan, K.-L., Power, R., and Yu, J. X. 1995. Spatial join strategies in distributed spatial DBMS. In Advances in Spatial Databases -- 4th International Symposium, SSD'95, vol. 1619 of Springer-Verlag Lecture Notes in Computer Science. Portland, ME, 348--367. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aji A., Wang, F., Vo H., Lee, R., Liu, Q., Zhang, X. and Saltz, J. 2013. Hadoop-GIS: a high performance spatial data warehousing system over mapreduce. In Proceedings of the VLDB Endowment, 6 (11), (pp. 1009--1020). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Baig F., Mehrotra M., Vo H., Wang F., Saltz J., Kurc T. 2016. SparkGIS: efficient comparison and evaluation of algorithm results in tissue image analysis studies. In Biomedical Data Management and Graph Online Querying. Big-O(Q) 2015, DMAH 2015. Lecture Notes in Computer Science, vol 9579. Springer.Google ScholarGoogle Scholar
  4. Brinkhoff, T., Kriegel, H. P., and Seeger, B. 1996. Parallel processing of spatial joins using r-trees. In Proceedings of the 12th International Conference on Data Engineering (pp. 258--265). IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Dittrich, J. P., and Seeger, B. 2000. Data redundancy and duplicate detection in spatial join processing. In Data Engineering, 2000. Proceedings of the 16th International Conference on (pp. 535--546). IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Du, Z., Zhao, X., Ye, X., Zhou, J., Zhang, F., and Liu, R. 2017. An effective high-performance multiway spatial join algorithm with spark. ISPRS International Journal of Geo-Information, 6 (4): 96.Google ScholarGoogle ScholarCross RefCross Ref
  7. Eldawy, A., and Mokbel, M. F. 2015. SpatialHadoop: a mapreduce framework for spatial data. In Data Engineering (ICDE), 2015 IEEE 31st International Conference on (pp. 1352--1363). IEEE.Google ScholarGoogle Scholar
  8. Esri. 2013. GIS Tools for Hadoop. https://github.com/Esri/gis-tools-for-hadoop (referenced 2017/06).Google ScholarGoogle Scholar
  9. Esri. 2016. ArcGIS GeoAnalytics Server. http://server.arcgis.com/en/server/latest/get-started/windows/what-is-arcgis-geoanalytics-server-.htm (referenced 2017/06).Google ScholarGoogle Scholar
  10. Ester, M., Kriegel, H. P., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), pp. 226--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gargantini, I. 1982. An effective way to represent quadtrees. Communications of the ACM 25, 12 (December 1982), 905--910. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Guttman, A. 1984. "R-trees: a dynamic index structure for spatial searching," in Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, (pp. 47--57). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hagedorn, S., Götze, P., and Sattler, K. U. 2017. The STARK framework for spatio-temporal data analytics on spark. In Proceedings of the 17th Conference on Database Systems for Business, Technology, and the Web (BTW 2017), Stuttgart, Germany, March 2017.Google ScholarGoogle Scholar
  14. Hjaltason, G., and Samet, H. 2002. Speeding up construction of PMR quadtree-based spatial indexes. VLDB Journal, 11, 2 (October 2002), 109--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hoel, E. and Samet, H. 1994. Data-parallel spatial join algorithms. In Proceedings of the 23rd International Conference on Parallel Processing. Vol. 3. St. Charles, IL, 227--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jacox, E. H., and Samet, H. 2007. Spatial join techniques. ACM Transactions on Database Systems (TODS), 32(1), 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kornacker, M., and Erickson, J. 2012. Cloudera Impala: Real Time Queries in Apache Hadoop, For Real. http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real.Google ScholarGoogle Scholar
  18. Mamoulis, N. and Papadias, D. 2001. Multiway spatial joins. ACM Transactions on Database Systems 26, 4 (Dec.), 424--475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Nelson, R. C., and Samet, H. 1986. A consistent hierarchical representation for vector data. In ACM SIGGRAPH Computer Graphics, 20 (4), pp. 197--206). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Nievergelt, J., Hinterberger, H., and Sevcik, K. C. 1984. The grid file: An adaptable, symmetric multikey file structure. ACM Transactions on Database Systems (TODS), 9(1), 38--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Orenstein, Jack A. "Multidimensional tries used for associative searching." Information Processing Letters 14.4 (1982): 150--157.Google ScholarGoogle Scholar
  22. Raad, M. 2013. BigData Spatial Joins, Blog post. http://thunderheadxpler.blogspot.com/2013/10/bigdata-spatial-joins.html (referenced 2017/06).Google ScholarGoogle Scholar
  23. Sellis, T. K., Roussopoulos, N., and Faloutsos, C. 1987. The R+-tree: a dynamic index for multi-dimensional objects, in Proceedings of the 13th International Conference on Very Large Data Bases (VLDB), (pp. 507--518). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sriharsha, R., 2015. Magellan: geospatial analytics on spark, https://hortonworks.com/blog/magellan-geospatial-analytics-in-spark/ (referenced 2017/06).Google ScholarGoogle Scholar
  25. Tang, MingJie, Yongyang Yu, Qutaibah M. Malluhi, Mourad Ouzzani and Walid G. Aref. "LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data." PVLDB 9 (2016): 1565--1568. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Valduriez, P., and Gardarin, G. 1984. Join and semijoin algorithms for a multiprocessor database machine. ACM Transactions on Database Systems (TODS), 9(1), 133--161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. White, T. 2009. Hadoop: The Definitive Guide (1st ed.). O'Reilly Media, Inc., Sebastopol, CA, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Whitman, R. T., Park, M. B., Ambrose, S. M., and Hoel, E. G. 2014. Spatial indexing and analytics on hadoop. In Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (pp. 73--82). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Dong Xie, Feifei Li, Bin Yao, Gefei Li, Liang Zhou, and Minyi Guo. 2016. Simba: Efficient In-Memory Spatial Analytics. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, New York, NY, USA, 1071--1085. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. You, S., Zhang, J., and Gruenwald, L. 2015. Large-scale spatial join query processing in cloud. In Data Engineering Workshops (ICDEW), 2015 31st IEEE International Conference on (pp. 34--41). IEEE.Google ScholarGoogle Scholar
  31. Yu, J., Wu, J., and Sarwat, M. 2015. Geospark: A cluster computing framework for processing large-scale spatial data. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems (p. 70). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Zaharia, M., Chowdhury, M., Franklin, M., Shenker, S., and Stoica, I. 2010. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '10), Boston, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Zhang, S., Han, J., Liu, Z., Wang, K., and Xu, Z. 2009. SJMR: Parallelizing spatial join with mapreduce on clusters. In Cluster Computing and Workshops, 2009. CLUSTER'09. IEEE international conference on (pp. 1--8). IEEE.Google ScholarGoogle Scholar
  34. Zhong, Yunqin, Jizhong Han, Tieying Zhang, Zhenhua Li, Jinyun Fang and Guihai Chen. "Towards Parallel Spatial Query Processing for Big Spatial Data." 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (2012): 2085--2094. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Spatio-Temporal Join on Apache Spark

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGSPATIAL '17: Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
        November 2017
        677 pages
        ISBN:9781450354905
        DOI:10.1145/3139958

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 November 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        SIGSPATIAL '17 Paper Acceptance Rate39of193submissions,20%Overall Acceptance Rate220of1,116submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader