ABSTRACT
Species distribution data play an important role in biodiversity related research, especially in exploring relationships with the environment. In the recent years, both the number of species being explored and the spatial resolution of species distribution data are increasing fast. It is thus imperative to develop database systems that allow users to efficiently query such large-scale data based on spatial and non-spatial (e.g., taxonomic and phylogenetics) criteria.
In this paper, we present our approach to building such a system by integrating several components, including a quadtree representation of binary raster data, tree path indexing and query processing in PostgreSQL, and window decomposition techniques for spatial queries. Our unique contribution is in associating species identifiers with intermediate quadtree nodes and query optimization for multiple independent queries after window query decomposition. Our system enables PostgreSQL to support binary raster data without requiring any changes to the database backend and is suitable for managing large-scale species distribution data.
Our experiments using 4000+ bird species distribution data related to the Western hemisphere show that the proposed approach in associating species identifiers with quadtree nodes reduces the number of database tuples by more than 1/3 and the average identifiers to be associated with each tuple from 110.6 to 4.8, a significant improvement compared to classic quadtree-based approaches. With respect to query optimization, optimized queries are 6--9.5 times faster than the baseline queries for average query response times and 5.5--8.3 times faster than the baseline queries for maximum query response times for four query window sizes ranging from 0.1 to 5.0 degrees. Our query optimization techniques thus make the system suitable for many interactive applications for querying and exploring species distribution data.
- A. Aboulnaga and W. G. Aref: Window query processing in linear quadtrees. Distributed and Parallel Databases 10(2):111--126, 2001. Google ScholarDigital Library
- W. G. Aref and I. F. Ilyas: SP-GiST: An extensible database index for supporting space partitioning trees. Journal of Intelligent Information Systems 17(2--3):215--240, 2001. Google ScholarDigital Library
- W. G. Aref and H. Samet: Decomposing a Window into Maximal Quadtree Blocks. Acta Informatica 30(5):425--439, 1993.Google ScholarDigital Library
- W. G. Aref and H. Samet: Efficient Window Block Retrieval in Quadtree-Based Spatial Databases. Geoinformatica 1(1):59--91, 1997. Google ScholarDigital Library
- SEDAC Species Distribution Grid. http://sedac.ciesin.columbia.edu/species/.Google Scholar
- Distributed Generic Information Retrieval (DiGIR). http://digir.sourceforge.net.Google Scholar
- Biodiversity Information Standards (TDWG). http://www.tdwg.org.Google Scholar
- M. Y. Eltabakh, R. Eltarras, and W. G. Aref: Space-Partitioning Trees in PostgreSQL: Realization and Performance. In Proc. 22nd International Conference on Data Engineering, IEEE Computer Society, 2006. Google ScholarDigital Library
- C. Esperanca, and H. Samet: Experience with SAND-Tcl: A scripting tool for spatial databases. Journal of Visual Languages and Computing 13(2):229--255, 2002.Google ScholarDigital Library
- Y. Fang, M. Friedman, G. Nair, M. Rys, and A. F. Schmid: Spatial indexing in Microsoft SQL server 2008. In Proc. 2008 ACM SIGMOD International Conference on Management of Data, 1207--1216, 2008. Google ScholarDigital Library
- V. Gaede and O. Günther: Multidimensional access methods. ACM Computing Surveys 30(2):170--231, 1998. Google ScholarDigital Library
- I. Gargantini: An Effective Way to Represent Quadtrees. Communications of the ACM 25(2):905--910, 1982. Google ScholarDigital Library
- Global Biodiversity Information Facility. http://www.gbif.org.Google Scholar
- K. He and J. Zhang: Testing the correlation between beta diversity and differences in productivity among global ecoregions, biomes, and biogeographical realms. Ecological Informatics 4(2):93--98, 2009.Google ScholarCross Ref
- HerpNET: A new tool for the study and conservation of biodiversity. http://www.herpnet.org.Google Scholar
- G. M. Hunter and K. Steiglitz: Operations on Images Using Quad Trees. IEEE Transactions on Pattern Analysis and Machine Intelligence 1(2):145--153, 1979.Google ScholarDigital Library
- R. K. V. Kothuri, S. Ravada, and D. Abugov: Quadtree and R-tree indexes in Oracle spatial: a comparison using GIS data. In Proc. 2002 ACM SIGMOD international conference on management of data, 546--557, 2002. Google ScholarDigital Library
- R. Kothuri, A. Godfrind, and E. Beinat: Pro Oracle Spatial for Oracle Database 11g. Apress, 2007. Google ScholarDigital Library
- Y. Manolopoulos, A. Nanopoulos, A. N. Papadopoulos, and Y. Theodoridis: R-Trees: Theory and Applications. Springer, New York, 2005. Google ScholarDigital Library
- Mammal Networked Information System )MaNIS. http://manisnet.org/.Google Scholar
- D. M. Mark and D. J. Abel: Linear Quadtrees from Vector Representations of Polygons. IEEE Transactions on Pattern Analysis and Machine Intelligence 7(3):344--349, 1985.Google ScholarDigital Library
- G. M. Morton: A computer oriented geodetic data base and a new technique in file sequencing. Technical Report, IBM, Ottawa, Canada, 1966.Google Scholar
- NatureServe: A Network Connecting Science with Conservation. http://www.natureserve.org/getData/animalData.jsp.Google Scholar
- Ocean Biographic Information System (OBIS). http://www.iobis.org/.Google Scholar
- ORNithological Information System (ORNIS). http://olla.berkeley.edu/ornisnet/.Google Scholar
- LTREE Module for PostgreSQL http://www.postgresql.org/docs/current/static/ltree.html.Google Scholar
- G. Proietti: An optimal algorithm for decomposing a window into maximal quadtree blocks. Acta Informatica 36(4):257--266, 1999.Google ScholarCross Ref
- rasdaman: The Intelligent RasterServer. http://www.rasdaman.com.Google Scholar
- H. Samet: The Quadtree and Related Hierarchical Data-Structures. ACM Computing Surveys 16(2):187--260, 1984. Google ScholarDigital Library
- H. Samet, A. Rosenfeld, C. A. Shaffer, and R. E. Webber: A Geographic Information-System Using Quadtrees. Pattern Recognition 17(6):647--656, 1984.Google ScholarCross Ref
- H. Samet: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., 2005. Google ScholarDigital Library
- H. Samet and R. E. Webber: Extending the SAND spatial database system for the visualization of three-dimensional scientific data. Geographical Analysis 36, 87--101, 2006.Google ScholarCross Ref
- C. A. Shaffer, H. Samet, and R. C. Nelson: QUILT: a geographic information system based on quadtrees. International Journal of Geographical Information Systems, Volume 4(2):103--31, 1990.Google ScholarCross Ref
- Y. H. Tsai, K. L. N. Chung, and W. Y. Chen: A strip-splitting-based optimal algorithm for decomposing a query window into maximal quadtree blocks. IEEE Transactions on Knowledge and Data Engineering 16(4):519--523, 2004. Google ScholarDigital Library
- Digital Representations of Tree Species Range Maps from "Atlas of United States Trees" by Elbert L. Little, Jr. http://esp.cr.usgs.gov/data/atlas/little/.Google Scholar
- J. Zhang and L. Gruenwald: Embedding and extending GIS for exploratory analysis of large-scale species distribution data. In Proc. 16th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, 28, 2008. Google ScholarDigital Library
- J. Zhang, D. D. Pennington, and X. Liu: GBD-Explorer: Extending open source Java GIS for exploring ecoregion-based biodiversity data. Ecological Informatics 2(2):94--102, 2007.Google ScholarCross Ref
Index Terms
- Efficiently managing large-scale raster species distribution data in PostgreSQL
Recommendations
SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases
ICDE '09: Proceedings of the 2009 IEEE International Conference on Data EngineeringA paramount challenge in probabilistic databases is the scalable computation of confidences of tuples in query results. This paper introduces an efficient secondary-storage operator for exact computation of queries on tuple-independent probabilistic ...
Comparison of Different Solutions for Solving the Optimization Problem of Large Join Queries
DBKDA '10: Proceedings of the 2010 Second International Conference on Advances in Databases, Knowledge, and Data ApplicationsThe article explores the optimization of queries using genetic algorithms and compares it with the conventional query optimization component. Genetic algorithms (GAs), as a data mining technique, have been shown to be a promising technique in solving ...
Synchronous incremental update of materialized views for PostgreSQL
Materialized views are logically excess stored query results in SQL-oriented databases. This technology can significantly improve the performance of database systems. Although the idea of materialized views came up in the 1980s, only three database ...
Comments