skip to main content
10.1145/1653771.1653815acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

Efficiently managing large-scale raster species distribution data in PostgreSQL

Published:04 November 2009Publication History

ABSTRACT

Species distribution data play an important role in biodiversity related research, especially in exploring relationships with the environment. In the recent years, both the number of species being explored and the spatial resolution of species distribution data are increasing fast. It is thus imperative to develop database systems that allow users to efficiently query such large-scale data based on spatial and non-spatial (e.g., taxonomic and phylogenetics) criteria.

In this paper, we present our approach to building such a system by integrating several components, including a quadtree representation of binary raster data, tree path indexing and query processing in PostgreSQL, and window decomposition techniques for spatial queries. Our unique contribution is in associating species identifiers with intermediate quadtree nodes and query optimization for multiple independent queries after window query decomposition. Our system enables PostgreSQL to support binary raster data without requiring any changes to the database backend and is suitable for managing large-scale species distribution data.

Our experiments using 4000+ bird species distribution data related to the Western hemisphere show that the proposed approach in associating species identifiers with quadtree nodes reduces the number of database tuples by more than 1/3 and the average identifiers to be associated with each tuple from 110.6 to 4.8, a significant improvement compared to classic quadtree-based approaches. With respect to query optimization, optimized queries are 6--9.5 times faster than the baseline queries for average query response times and 5.5--8.3 times faster than the baseline queries for maximum query response times for four query window sizes ranging from 0.1 to 5.0 degrees. Our query optimization techniques thus make the system suitable for many interactive applications for querying and exploring species distribution data.

References

  1. A. Aboulnaga and W. G. Aref: Window query processing in linear quadtrees. Distributed and Parallel Databases 10(2):111--126, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. W. G. Aref and I. F. Ilyas: SP-GiST: An extensible database index for supporting space partitioning trees. Journal of Intelligent Information Systems 17(2--3):215--240, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. W. G. Aref and H. Samet: Decomposing a Window into Maximal Quadtree Blocks. Acta Informatica 30(5):425--439, 1993.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. W. G. Aref and H. Samet: Efficient Window Block Retrieval in Quadtree-Based Spatial Databases. Geoinformatica 1(1):59--91, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. SEDAC Species Distribution Grid. http://sedac.ciesin.columbia.edu/species/.Google ScholarGoogle Scholar
  6. Distributed Generic Information Retrieval (DiGIR). http://digir.sourceforge.net.Google ScholarGoogle Scholar
  7. Biodiversity Information Standards (TDWG). http://www.tdwg.org.Google ScholarGoogle Scholar
  8. M. Y. Eltabakh, R. Eltarras, and W. G. Aref: Space-Partitioning Trees in PostgreSQL: Realization and Performance. In Proc. 22nd International Conference on Data Engineering, IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Esperanca, and H. Samet: Experience with SAND-Tcl: A scripting tool for spatial databases. Journal of Visual Languages and Computing 13(2):229--255, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Fang, M. Friedman, G. Nair, M. Rys, and A. F. Schmid: Spatial indexing in Microsoft SQL server 2008. In Proc. 2008 ACM SIGMOD International Conference on Management of Data, 1207--1216, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. V. Gaede and O. Günther: Multidimensional access methods. ACM Computing Surveys 30(2):170--231, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. I. Gargantini: An Effective Way to Represent Quadtrees. Communications of the ACM 25(2):905--910, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Global Biodiversity Information Facility. http://www.gbif.org.Google ScholarGoogle Scholar
  14. K. He and J. Zhang: Testing the correlation between beta diversity and differences in productivity among global ecoregions, biomes, and biogeographical realms. Ecological Informatics 4(2):93--98, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  15. HerpNET: A new tool for the study and conservation of biodiversity. http://www.herpnet.org.Google ScholarGoogle Scholar
  16. G. M. Hunter and K. Steiglitz: Operations on Images Using Quad Trees. IEEE Transactions on Pattern Analysis and Machine Intelligence 1(2):145--153, 1979.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. K. V. Kothuri, S. Ravada, and D. Abugov: Quadtree and R-tree indexes in Oracle spatial: a comparison using GIS data. In Proc. 2002 ACM SIGMOD international conference on management of data, 546--557, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Kothuri, A. Godfrind, and E. Beinat: Pro Oracle Spatial for Oracle Database 11g. Apress, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. Manolopoulos, A. Nanopoulos, A. N. Papadopoulos, and Y. Theodoridis: R-Trees: Theory and Applications. Springer, New York, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mammal Networked Information System )MaNIS. http://manisnet.org/.Google ScholarGoogle Scholar
  21. D. M. Mark and D. J. Abel: Linear Quadtrees from Vector Representations of Polygons. IEEE Transactions on Pattern Analysis and Machine Intelligence 7(3):344--349, 1985.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. M. Morton: A computer oriented geodetic data base and a new technique in file sequencing. Technical Report, IBM, Ottawa, Canada, 1966.Google ScholarGoogle Scholar
  23. NatureServe: A Network Connecting Science with Conservation. http://www.natureserve.org/getData/animalData.jsp.Google ScholarGoogle Scholar
  24. Ocean Biographic Information System (OBIS). http://www.iobis.org/.Google ScholarGoogle Scholar
  25. ORNithological Information System (ORNIS). http://olla.berkeley.edu/ornisnet/.Google ScholarGoogle Scholar
  26. LTREE Module for PostgreSQL http://www.postgresql.org/docs/current/static/ltree.html.Google ScholarGoogle Scholar
  27. G. Proietti: An optimal algorithm for decomposing a window into maximal quadtree blocks. Acta Informatica 36(4):257--266, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  28. rasdaman: The Intelligent RasterServer. http://www.rasdaman.com.Google ScholarGoogle Scholar
  29. H. Samet: The Quadtree and Related Hierarchical Data-Structures. ACM Computing Surveys 16(2):187--260, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. Samet, A. Rosenfeld, C. A. Shaffer, and R. E. Webber: A Geographic Information-System Using Quadtrees. Pattern Recognition 17(6):647--656, 1984.Google ScholarGoogle ScholarCross RefCross Ref
  31. H. Samet: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. Samet and R. E. Webber: Extending the SAND spatial database system for the visualization of three-dimensional scientific data. Geographical Analysis 36, 87--101, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  33. C. A. Shaffer, H. Samet, and R. C. Nelson: QUILT: a geographic information system based on quadtrees. International Journal of Geographical Information Systems, Volume 4(2):103--31, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  34. Y. H. Tsai, K. L. N. Chung, and W. Y. Chen: A strip-splitting-based optimal algorithm for decomposing a query window into maximal quadtree blocks. IEEE Transactions on Knowledge and Data Engineering 16(4):519--523, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Digital Representations of Tree Species Range Maps from "Atlas of United States Trees" by Elbert L. Little, Jr. http://esp.cr.usgs.gov/data/atlas/little/.Google ScholarGoogle Scholar
  36. J. Zhang and L. Gruenwald: Embedding and extending GIS for exploratory analysis of large-scale species distribution data. In Proc. 16th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, 28, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Zhang, D. D. Pennington, and X. Liu: GBD-Explorer: Extending open source Java GIS for exploring ecoregion-based biodiversity data. Ecological Informatics 2(2):94--102, 2007.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Efficiently managing large-scale raster species distribution data in PostgreSQL

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        GIS '09: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
        November 2009
        575 pages
        ISBN:9781605586496
        DOI:10.1145/1653771

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 November 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate220of1,116submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader